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SYSTEMATIC SCAN FOR SAMPLING COLORINGS 1 

By Martin Dyer, Leslie Ann Goldberg and Mark Jerrum 

University of Leeds, University of Warwick and University of Edinburgh 

We address the problem of sampling colorings of a graph G by 
Markov chain simulation. For most of the article we restrict atten- 
tion to proper g-colorings of a path on n vertices (in statistical physics 
terms, the one-dimensional g-state Potts model at zero temperature), 
though in later sections we widen our scope to general "//-colorings" 
of arbitrary graphs G. Existing theoretical analyses of the mixing time 
of such simulations relate mainly to a dynamics in which a random 
vertex is selected for updating at each step. However, experimen- 
tal work is often carried out using systematic strategies that cycle 
through coordinates in a deterministic manner, a dynamics some- 
times known as systematic scan. The mixing time of systematic scan 
seems more difficult to analyze than that of random updates, and 
little is currently known. In this article we go some way toward cor- 
recting this imbalance. By adapting a variety of techniques, we derive 
upper and lower bounds (often tight) on the mixing time of system- 
atic scan. An unusual feature of systematic scan as far as the analysis 
is concerned is that it fails to be time reversible. 

1. Introduction. Many models in statistical physics come under the head- 
ing of "spin systems." Such a system is specified by a graph G, in our case 
finite. Configurations of the system are assignments of "spins" to the vertices 
of G. There are assumed to be q possible spins, and, hence, potentially q n 
configurations, where n is the number of vertices of G, though some of these 
configurations may be illegal. Each configuration has an energy that comes 
from summing, over all edges of G, the interaction energies between adjacent 
spins. These energies specify a probability distribution, called the Boltzmann 
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distribution, on configurations. The Potts model and the hard-core lattice 
gas model are examples of spin systems. 

In this paper for consistency with previous literature, we shall refer to 
spins as colors and to configurations as states. Sampling from the Boltz- 
mann distribution is a challenging computational task. Often, the only fea- 
sible way of going about it is to simulate a suitable random "dynamics" on 
configurations. The dynamics has the property of converging to a stationary 
distribution which is the Boltzmann distribution. This is usually straight- 
forward to arrange. The hard part is proving that the dynamics is "rapidly 
mixing," that is, converges rapidly to stationarity. 

Identifying the vertices of G with the integers {1,2, ... ,n}, we may think 
of the state space as having coordinates. There is a substantial body of 
literature concerned with bounding mixing time (i.e., time to convergence 
to near-stationarity) of systems such as those described above. Almost all 
this theoretical work relates to random single-site updates, which choose a 
random coordinate for updating at each transition. We shall refer to this 
strategy as Glauber dynamics. (The term "Glauber dynamics" appears not 
to have a precise agreed meaning. Here we are using the term to signify single 
site updates performed in a random sequence. These are certainly aspects of 
the dynamics first considered by Glauber [18].) However, experimental work 
is often carried out using systematic strategies that cycle through coordi- 
nates in a deterministic manner, a dynamics we refer to as systematic scan 
(or just "scan" for short). The mixing time of systematic scan seems more 
difficult to analyze that that of Glauber, and little is currently known. 

In this paper we take some first steps in analyzing systematic scan for 
spin systems. Our setting will be very simple; indeed, for the most part, 
we will restrict attention to proper g-colorings of a path of n vertices (in 
statistical physics terms, the one-dimensional (/-state Potts model at zero 
temperature). To compensate for the simple setting, we provide tight (i.e., 
matching within a constant factor) upper and lower bounds on mixing time. 
Measuring mixing time in terms of the number of updates of individual ver- 
tices (so that one scan equates to n updates), we show that when q = 3, 
mixing occurs in 0(n 3 logn) updates, whether Glauber dynamics or sys- 
tematic scan is used; while when q > 4, mixing occurs in O(nlogn) updates, 
again independently of whether Glauber or scan is used. Our main tools are 
harmonic analysis [29], path coupling [6] and disagreement percolation [28]. 

Later in the paper we considerably widen the setting from usual proper 
colorings to general //-colorings (also known a graph homomorphisms), but 
staying at first with the path as the underlying graph, //-colorings model 
arbitrary spin systems with symmetric "hard" constraints. We show that, 
for any H , Glauber mixes in 0(n 5 ) updates and scan in 0(n 6 ) updates. The 
former bound is unlikely to be tight, and the latter even less so. The method 
here is that of canonical paths [10, 27]. 
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Finally, we consider f/-colorings of a general graph 67, and compare the 
mixing times of scan and Glauber. We show that, for any H, these are within 
a polynomial factor of each other (in terms of total number of individual 
updates performed), at least when G is of bounded degree. The question 
of whether scan can ever be faster than Glauber, or vice versa, remains 
a tantalizing open problem. The only situation where a gap is known is 
the rather uninteresting one that arises when 67 is the empty graph, where 
Glauber requires 0(nlog?i) updates [13], while scan clearly mixes in one 
sweep. 

1.1. Previous work. Amit [3] has investigated systematic scan in the con- 
text of sampling from multivariate Gaussian distributions. In this instance, 
one iteration of systematic scan applies a "heat-bath" update to each coor- 
dinate axis in turn. Amit precisely calculates the spectral gap of the scan 
operator and, hence, bounds the mixing time. He also estimates the spectral 
gap of a similar process on perturbed Gaussian distributions. 

In another application of systematic scan — this time more combinatorial 
in nature and slightly closer to the one studied here — Diaconis and Ram 
[8] consider the problem of generating random elements of a finite group. 
The systematic scan Metropolis algorithm cycles through the generators in 
order, and flips coins to decide whether or not to multiply by each genera- 
tor in turn. The random update algorithm chooses one of the n generators 
uniformly at random at each step. For the symmetric group, they show 
that the systematic scan algorithm mixes in 0(ra) scans, so consideration of 
0(n 2 ) selections of generators is necessary and sufficient for mixing. They 
consider two different scanning strategies from [17] — the same results hold 
for both strategies. Matching results (in terms of the number of generators 
considered) are given by Benjamini et al. [5] for the random update strat- 
egy. Diaconis and Ram also consider the hypercube and the dihedral group. 
For the hypercube, they show that O(nlogre) updates are necessary and 
sufficient, whether one is doing random updates or systematic scan. For the 
dihedral group, both strategies take O(n) updates. Diaconis and Ram point 
out that careful analysis of rates of convergence for the Metropolis algorithm 
is completely open in nongroup cases. 

For a brief review of other work on systematic scan, consult Diaconis and 
Ram [8], Section 2b. 

2. Definitions and notation. The variation distance between distribu- 
tions 9\ and 82 on f2 is 

d TV (0i , O2) = \ E 1 01 (0 - ^ (i) I = max 1 9 1 (A) - 6 2 ( A) \ . 

i 
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For a discrete ergodic Markov chain A4 with transition matrix P and sta- 
tionary distribution n, and a specified initial state x, the mixing time (as a 
function of the deviation e from stationarity) is 

Mix x (M,e) =mm{t>0:d TY {P t {x,-),TT(-)) <e}. 

The mixing time of M. is Mix(A^,e) = max x Mix x ( A4 , e) . 

Suppose G is an undirected graph with vertex set {l,...,n}. To avoid 
trivialities, we assume n > 3. We consider g-colorings of G, where q > 3. 
Formally, a coloring a is a vector a = (ci, . . . , cr n ) in which crj € {0, . . . , q — 1} 
denotes the color of vertex i. A coloring is proper if adjacent vertices receive 
different colors. Vl + = {0, . . . , q — l} n is the set of all colorings (proper and 
improper), while f2 is the set of all proper colorings. 

A Markov chain with state space Q starts at a coloring o~(0) and visits 
a sequence of colorings <t(0),ct(1), .... We often use r to denote a coloring 
(when we need two names). The two Markov chains that we study are as 
follows: 

• A4gi (Glauber): Choose vertex v uniformly at random; do Metropolis (v). 

• (Systematic scan): For v := 1 to n, do Metropolis (v). 

The procedure Metropolis (v) used in both of the above dynamics performs 
as follows: A color c is chosen uniformly at random. A proposed new coloring 
is formed by recoloring vertex v with color c. This proposed move is accepted 
if and only if color c is not used at any neighbor of v. 

Let Pq\ be the transition matrix of A4qi and Let P_> be the transition 
matrix of jM_>. It will be convenient in our proofs to consider reverse sys- 
tematic scan: 

• A4^- (Reverse scan): For v :=n down to 1, do Metropolis (v). 

Let P+- be the transition matrix of A4,_. Observe that is the time 

reversal of A4_>, since P^>{a,a') = P < -(a',a) for all a, a' G Q. 

Let A4 be any discrete Markov chain with transition matrix P, stationary 
distribution tt and state space fL Define the optimal Poincare constant of 
M by 

X(M)= inf 

f-.n^R var 7r (/) 

where the inf is over all nonconstant functions from to R and the Dirichlet 
form is given by 

£mUJ) = \ E <x)P(x,y)(f(x)-f{y)) 2 

and 

var^(/) = E^)(/(^)-E./) 2 = i £ 7T(x)n(y)(f(x)-f(y)) 2 . 
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If M. is time-reversible with respect to ir [i.e., ir(a)P(a,T) = it(t)P(t, a)], 
then the eigenvalues of P are real and can be written 1 = (3q > /3± > ■ • ■ > 
> — 1. Then X(M) is is equal to 1 — 
Some of our rapid-mixing proofs will use the method of "path coupling 
[6]. In our path-coupling proofs, we will define partial couplings on the set 
S, which will always be the set of pairs of colorings that differ on a single 
vertex. 

For most of the paper we consider the case in which G is a path going left 
to right from vertex 1 to vertex n. Kenyon and Randall [24] have shown that, 
for every q, the block dynamics, which updates a sufficiently large constant- 
length path at each step, mixes in time 0{n log n). Our results show that 
this upper bound holds for single-site dynamics for q > 4, but not for q = 3. 

In our analysis for q = 3 we will study two auxiliary Markov chains 
on state space T = {— 1, l} n ~ . A configuration X £ T is a vector X = 
(Xi, . . . , X n -i). The corresponding Markov chain evolves as X(Q),X(1), .... 
The next section generalizes this framework. 

3. The analysis technique for q = 3. The following develops an idea 
of Wilson [29] for lower bounding the convergence rate of certain types of 
Markov chains. 

Let A4 be a finite ergodic Markov chain with transition matrix P and 
state space T C Z m . (This is a more general setting than our current ap- 
plication demands, but it is the natural one in which to develop the ideas.) 
Suppose there exists a matrix A such that E[X(1)|X(0)] = AX(0) for all 
X(0) G T. (The method may still be applicable when we have only an 
affine dependence here. For provided A — I is invertible, an affine depen- 
dence ELY(l)] = AX(0) + b can be reduced to one of the required form by 
moving the origin in T. In particular, E[X(1)] = AX(0) + b is the same as 
E[X(1) + c] = A(X(0) + c) for b = (A- I)c.) We will assume that A has real 
eigenvalues, though it is possible to extend the method to complex eigenval- 
ues. We may further assume that A has only nonnegative eigenvalues, since 
otherwise we can consider the two-step chain M 2 = (T, P 2 ) which converges 
exactly twice as fast. Now let A be any eigenvalue of A, with left eigenvector 
w. Then, 

(1) B[wX(t)\X(0)] = wA^iO) = A'wX(O). 

Let &t = wX(t). To obtain the strongest lower bound, we choose A to be 
the largest eigenvalue such that there exist x, y E T with wx / wy. Then 
we choose X(0) = argmax^ \wx\. Since w is defined only to scalar multi- 
ples, we may assume wX(0) > 0. It follows from (1) that A < 1. Otherwise 
limsup^QQ E[$t] = oo, contradicting the finiteness of A4. If A = 1, we have 
E[3>t] = <3?o for all t. But $j < <J> , so we must have $ t = $0 for all t. Using 
ergodicity of Ai, this implies wx = wy for all x,y £ T, contradicting our 
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choice of A. Thus, A < 1 and, hence, lim^oo E[j/)X(t)] = 0. If X(oo) denotes 
(a r.v. with) the equilibrium distribution, it follows that E[-u;X(oo)] = 0. 

We will now consider the quantities Ef^l^-i] and var( < I > t|$t_i). Defini- 
tions of conditional expectations and variances can be found in [11] (pages 
190-198). We will use the fact that var(Y) = E[var(Y|J*r)] + var(E[F|X]) 
(page 198). Suppose that E[var($t|<I>t-i)] < P for all t > 0, and let v = 
p/(l - A 2 ). Now using E[$ t |$ t _i] = \® t -i and var(<£ ) = 0, 

var($ t ) = E[var($ t |$ t _i)]+var(E[$ t |$t_i]) 

= E[var($ t |$ f _i)]+var(A$ t _i) 

(2) = E[var($ t |$ t _ 1 )]+A 2 var($ t _ 1 ) 

< p + A 2 var(<I> f _i) 

<X> 2 V</V(1-A 2 ) = ^. 

i=0 

Instead of (2), Wilson uses v = R/2j, where 7 = 1 — A and E[($t — 
&t-i) 2 \&t-i] < R- The calculation to justify this is longer, and the con- 
clusion is not valid for all A. However, since p< R and usually A = o(l), (2) 
implies Wilson's bound asymptotically, but, in general, they are incompa- 
rable. Now, using Chebyshev's inequality, 

/ , [2v\ 1 
Pr I $ t < A $0 — Y — I < ^ e and Pr 




Thus, dTv(^t,^oo) < 1 - e only if A*^o < 2^/2u/e. [We will abuse the no- 
tation dTv(") •) f° r variation distance by extending it to random variables.] 
The latter inequality holds only if 

ln(l/A) " 1-A 
Setting E = \, we find that 

( . , 1\ Aln($o/4v^) 
(3) MixUW,- > 



2) ~ 1-A 

We say that a Markov chain is monotone with respect to a partial order 
< on its state space if two realizations X(t) and Y{t) of it may be coupled 
so that X(0) > Y(0) implies X(t) > Y(t) for all t £ N. We refer to such a 
coupling as a "monotone coupling." Suppose M is monotone with respect 
to the product partial order < on R m , that is, the partial order defined 
by x < y if and only if Xi < yi for all i G {1,. . . ,m}. If the weight vector 
w > (in the product order), we can use it to bound the mixing time from 
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above. Let us re-scale w so that mirij W{ = 1. Let d(x, y) = Y^=i w i\ x i ~ Vi\ f° r 
x, y G T. Then d is a metric, since w > 0. Now consider x, y G T with x>y 
and let X(f),Y(t) be a monotone coupling with X(0) = x and 5^(0) = y. 
Since X(i)>F(i), 

E[d(X(t), Y(t))] = E[w(X(t) - Y(t))] = wA(X{t - 1) - Y(t - 1)) 

= Xw(X(t - 1) - y(t - 1)) = Xd(X(t - l),Y(t - 1)). 

So 

d TY (X(t),Y(t))<Pr[X(t)^Y(t)] 

(4) <E[d(X(t),Y(t))] 

< A*d(X(0),y(0)) <2A*$ , 

where the final step is by the triangle inequality. Thus, dTv(-X"(*)> Y(t)) < e 
holds, provided t > ln(2<I>o/e)/ln(l/A), that is, provided t > ln(2$ /e)/(l - 
A). 

We would like to draw a similar conclusion when x and y are incompara- 
ble. We can do so provided the state space contains states T and _L satisfy- 
ing -L < z <T for all z G T. In this case, Pr(X(i) / Y(t)\X(0) = x,Y(0) = 
y) < Pr(X(t) / Y(t)\X(0) = T, Y(0) = !_) so we can apply (4) starting from 
X(0) = T and Y(0) = JL. Thus, 

(5) Mix(M,e)<ln(2* /e)/(l-A). 

When f is sufficiently small with respect to the upper bound (5) and 
the lower bound (3) agree to within a constant factor on the time to reach 
variation distance i, say. 

3.1. Bounding E[vax(&t\<&t-i)]- In order to use the technique in Section 
3, we have to find a p such that E[var(3>t|$t_i)] < p, where the expectation 
is over &t-i- 

For this, let Z t = <& t - $ t _i. Then 

E[var(* t |* t -i)] = E[var($ t _i + Z t \<f> t -i)] 
= E[var(Z t |$ t _i)] 
= E[E[Z 2 |^_ 1 ]-(E[Z t |^_ 1 ]) 2 ] 

< E[E[($ t - $ t _i) 2 |$ t _i]] 

< maxE[($i - $ t _i) 2 |$ t _i]. 
$t-i 

We will use the above inequality to find a suitable p. 
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3.2. A benchmark example. We illustrate the technique by applying it 
to a simple example whose analysis was also given in the Introduction to 
[8]. Consider mixing on the cube {—1, +l} m of the chain which changes the 
sign of a uniform random coordinate with probability ^. Then 

E[Xi(t + l)} = (l-l/m)Xi(t), 

so A = (1 — l/m)I, and all its eigenvalues are equal to (1 — 1/m). We may 
choose an arbitrary w, say, the vector (1,1,... ,1). Then we can take p = 2, 
so v = 2m/ (2 — 1/m) < 2m. Taking X (0) =w, $o = Tn, an d the lower bound 
(3) for mixing time is ^mlnm — 0(m). This chain is monotone, and the 
upper bound (5) is mlnm + 0(m). 

4. Glauber dynamics for q = 3 mixes in @(n 3 logn) updates. Let G 

be a path going left to right from vertex 1 to vertex n. Recall that £1 is the 
set of all proper colorings of G. 

4.1. Analysis of a related Markov chain. Let a be a coloring in Q. Note 
that, for every i G {1, . . . , n — 1}, we either have erj+i = Oi + 1 (mod 3) or 
<7j + i = (Tj — 1 (mod 3). We can associate a with a vector l£T = { — 1, l}™" 1 - 
Xj is 1 if (Tj+i = <Tj + 1 (mod 3) and X{ = — 1 otherwise. (Note that three 
colorings are mapped to the same configuration X — given o~\ and X, the 
coloring a can be recovered.) 

The Markov chain A^gi can be- associated with a Markov chain Mq^ 
on T. The moves of Mqi (from configuration X) are as follows. Choose 
r £ {1, . . . ,n} uniformly at random. If r = 1 (resp. r = n), then either, with 
probability |, change the sign of X\ (resp. X n -\) or, with the complemen- 
tary probability, do nothing. Otherwise (i.e., if 1 < r < n) then either, with 
probability |, exchange X T -\ and X r or, with the complementary probabil- 
ity, do nothing. 

In this section we analyze the mixing rate of A4 Gl . Note that this chain is 
monotone with respect to the usual partial order on T. Then straightforward 
calculations give 



B[X(t + l)]=AX(t) where A = I B 

and 
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Note that A is symmetric so has all real eigenvalues. Moreover, A is nonneg- 
ative and irreducible, so its largest eigenvalue A has a positive eigenvector 
w. The eigenvectors are identical to those of B, and A = (1 — A'/3n), where 
A' is the smallest eigenvalue of B. The "generic" row gives the equation 

(6) -Wi-i +2wi -w i+ i =\'wi, 

the form of which suggests a simple harmonic oscillation. So we will try the 
solution Wi = c n sin(m + /3), where c n is a positive scaling factor to be chosen 
later. Substituting in (6) gives A' = 2(1 — cos a) = 4 sin 2 (a/2). We also have 
the two "boundary conditions" 

(7) 3wi-w 2 = X'wi, —w n -2 + 3w n -i = X'Wn-l. 

The first equation in (7) gives sin(a + (3) = — sin/3, that is, (3 = —a/2. The 
second then gives sin((n — \)a) = — sin((n — |)a), so (n — ^)a = 2ir — (n — 
|)a, that is, a = 7r/(n — 1). Thus, 

/vr(i-l/2)\ 
w;; = c n sin[ — -— - — J>0, i = 1, . . . , n - 1, 

and w = (wi) is the (positive) eigenvector corresponding to the largest eigen- 
value. Our upper bound on mixing time requires Wi > 1, for all 1 < i < n — 1, 
and we set c n ~ 2n/ir to achieve this. (The symbol "~" denotes asymptotic 
convergence as n — > oo.) 

Now, if we let wo = w n = 0, we may take 

p = 2 max (wi — Wi-i) 2 = 2(w2 — w\) 2 ~ 8. 

l<i<n 

Also, A = 1 - 4sin 2 (vr/(2n - 2))/3n, so 1 - A ~ vr 2 /3n 3 . Hence, v ~ 12n 3 /vr 2 . 
Taking X(0) to be the all l's vector, 

n-l 

= c n Y] sinf : 



'7r(«- 1/2) 



=1 



n — 1 



c n Im 



n-l 

exp 



ivr(i-l/2) 



c n cosec 



n — 1 
vr \ /2n\ 2 



- 



2(n- I) J V / ' 

where i = \/— 1 in the second equality and the final equality follows from 
simplifying the geometric series as follows. For £ = mj{n — 1), the sum is 
equal to 

-2 



1 — e £ J exp(i7r/2(ra — 1)) — exp(— in/2{n — 1)) 
i 



sin(7r/2(n - 1)) 
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Substituting for A, <3?o an d v in the mixing time lower bound, (3) yields 
M1x(A / Jq 1 ,^) > |7r~ 2 n 3 lnn — 0(n 3 ). Also (for any positive e), the upper 
bound (5) is Mix(A^Q 1 ,e) < 37r _2 n 3 (2hm + lne -1 ) + 0(n 3 ). In summary, 
the mixing time of M G \ is 0(n 3 logn). 

4.2. Distance measures and a lower bound for Glauber dynamics. We will 
use two distance measures to analyze the Glauber-dynamics Markov chain 
First, we define the distance d\(a, r) for a G O and t£SJ follows. 
Let X be the member of T associated with a and Y be the member of 
T associated with r. Let d±(a, r) = Ham(X, Y), where Ham(X, Y) is the 
Hamming distance between X and Y, which is the number of indices i such 
that Xi^Yi. 

Using distance measure d\, the lower bound from Section 4.1 applies di- 
rectly to -Mgi- Thus, we obtain the following theorem. 

Theorem 1. Let G be the n-vertex path, and let q = 3. Then a lower 
bound on the mixing time of the Markov chain A4qi on the state space Q, is 
given by Mix(.Mgi> §) > §7r~ 2 n 3 lnn + 0(n 3 ). 

In order to upper-bound the mixing time of we will also define a 

second distance measure. 

Give the vertices 1, . . . ,n weights Ai, . . . , A n , respectively. These weights 
are positive rationals. Denote by S C x fi the set of all pairs of states 
(colorings) that differ at a single vertex (i.e., are Hamming distance 1 apart). 
If (<r, t) G S differs at vertex i, then let <fi(cr, r) = Aj. Define the function d% 
on !1 x (] as follows. For each pair (a, r) S O x $7, let 

fe-l 

(8) d2(<r,r)= A f T^(j>(j+l)), 

o;(0),...,w(fe) 

where the minimum is over all paths a = w(0), . . . , u(k) = r such that each 
ujj G and each pair (u(j),uj(j + 1)) G S. A path w(0), . . . , u(k) satisfying 
(8) is referred to as a geodesic path from a to r. 

In our couplings, we will want to be able to bound the expected change 
in the distance cfo. In order to do this, we use height functions. A height 
function h corresponding to a proper coloring a is a vector in Z n satisfying 
the following properties: 

1. For every vertex i, hi = i (mod 2). 

2. For every vertex i, hi = ai (mod 3). 

3. For every edge (i, i + \hi — = 1. 
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The height function is unique up to an additive constant. We define the 
distance between two height functions, h and h* , to be 

d{hx)= E ]hi ~^ ]Xi . 

ie{l,...,n} 

Let H(a) denote the set of height functions corresponding to coloring o. 

Lemma 2. For any pair of colorings (<r, r) E f2 x f2, 

d2{o~,r) = min d(h,h*). 
heH{a),h*eH(r) 

PROOF. To show that 

d<2(o~,T)> min d(h,h*), 

heH(a),h*EH{r) 

consider a geodesic path from a to t. Let h'(0) be any height function in 
TC(a) and let h'(0), . . . , h'(k) be the sequence of height functions correspond- 
ing to the geodesic path. Now 

min d(h,h*)<d(h'(0),h'(k)) 

h£H(<r),h*eH(r) 

k-1 

<J2d(h'(i),h'(i + l)) = d 2 (a,r). 
i=0 

To show that 

dzio, t) < min d(h,h*), 
heH{a),h*en(r) 

consider any h £ T~t(o~) and h* E W(r). A "height-function transformation" 
(see [20]) either takes a local maximum of a height function and pushes 
it down by two or takes a local minimum and pushes it up by two. We 
can show that there is a sequence h = h(0), . . . , h(k) = h* of height-function 
transformations transforming h into h* that chooses each vertex v only | h v — 
h%\/2 times. (This can be proved by induction on J2 V \hv~h%\- See Lemma 
4.3 of [20].) Now let w(0), . . . , to(k) be the sequence of colorings corresponding 
to h(0),...,h(k). Note that 



k-i 

$>Mi),«0' + i))<<*(M*)- 

j=0 



Thus, d 2 (a,r) <d(h,h*). □ 
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4.3. An upper bound for Glauber dynamics. Our upper bound comes 
from a two-stage argument. In the first stage we observe the evolution of 
A^Gb n °t directly, but via the auxiliary Markov chain Mqi- We know from 
Section 4.1 that the latter mixes in time 0(n 3 logn). Each state of 
corresponds to q states of so at this point we know that A4gi has mixed 
modulo a cyclic permutation of colors. In the second stage we show, using 
the di metric, that two colorings a and r differing by such a permutation 
may be coupled in a further 0(n 3 ) steps. The first stage gives the coupling a 
head start in the sense that a and r are already quite close in the d 2 metric. 
Omitting the first stage and running the ^-coupling in isolation would yield 
only an 0(n 5 ) bound on mixing time. 

Recall that d\ is Hamming distance on T. Suppose (<t(0),t(0)) 6 £1 x £1. 
Then 

Pr(di( ff (*),r(t)) > 1) < E(di( ff (*),T(t))). 

Applying (4) to the analysis in Section 4.1, the right-hand side is at most 
2A i( 3?o, where <E>o = 0(n 2 ) and 1 — A ~ 7r 2 /2?i 3 . So for some t' = 0(n 3 logra), 
we will have di(a(t'),T(t')) = 0, with probability at least By Lemma 2, 
dx(a(t'),T{t')) = implies that d 2 (a(t'),T(t')) = or d 2 (o-(t , ),T(t')) = 

J2i£{l,...,n} -V 

Now choose weights Ai = A n = 1/2 and A2 = • ■ • = A n _i = 1. We use path 
coupling on pairs (a(0),r(0)) G S. Starting with such a pair, run t' steps 
to get (a(t'),T(t')). With probability at least 39/40, di{a{t'),T{t')) = 0, in 
which case either cr(t') = r{t') or d2(o~(t'),T(t')) = n — 1. If the former holds, 
we are done, so suppose the latter. We now carry on from (a(t'),T(t')) using 
the identity coupling (i.e., to say the coupling that chooses the same vertex 
in both copies, and proposes the same color c in both). We will show in 
Section 4.3.1 below that, if we take any (ff,r)gllxfl and produce (cr',r') 
by one step of the identity coupling, then E^fV, t')\ < ^2(0", r). Thus, D t = 
d2(cr(t' + t),r(f + 1)) is a super-martingale with Dq = n — l. In Section 4.3.2 
below, we will define a quantity V = 0(l/n) and show that, for all t and 
all values of D t other than 0, E[(A+i - A) 2 | A] > V. Let B = lOn, and 
let T be the first time at which either (a) Dt = (i.e., coupling occurs), or 
(b) D t > B. Note that T is a stopping time. Define Z t = (B - D t ) 2 - Vt, 
and observe (see [25]) that Z t KT is a sub-martingale, where t AT denotes 
the minimum of t and T. Let p be the probability that (a) occurs. By 
the optional stopping theorem E[Dt] <Dq, so (1 — p)B < E[Dt] < -Do and 
p> I — Dq/ B > jq. Also, by the optional stopping theorem, 

pB 2 + (1 - p)E[(B - D T f\D T > B] - VE[T] 
= E[(B - D T f] - VE[T] = E[Z T ] 
>Z = (B- D ) 2 > 0. 
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Since \D t - D t -\\ < 2, (1 - p)E[(B - D T ) 2 \D T > B] < 4 < pB 2 so E[T] < 
(2pB 2 )/V . Conditioning on (a) occurring, it follows that E[T|Z>r = 0] < 
2B 2 /V. Hence, Pr(T > 20B 2 /V\D T = 0) < ^. So, if we now run the identity 
coupling for 20B 2 /V = 0(n 3 ) steps, then a and r will fail to couple with 
probability at most + 2 x < j. Thus, we have shown the following. 

Theorem 3. Let G be the n-vertex path, and let q = 3. Consider the 
Markov chain Mq\ on the state space Q. Then Mix(A^Gb \) = 0(n 3 logn). 

We can boost the coupling probability in the usual way to bound Mix(A^Gb e ) 
for e< 1/4. 

4.3.1. The coupling breaks even. Recall from Section 4.3 that Ai = A n = 
^ and A2 = • • • = A n _i = 1. 

Lemma 4. Suppose (a, r) £ S differs at vertex i. Obtain (a',r') by one 
step of the identity coupling. Then Efc^er', t')\ < ^2(0", r). 

Proof. Recall that n > 3. There are three cases. 
Suppose i G {l,n}. Then Efc^er', r')] — \ is equal to 

2 , 1 , 
-^-Ai + — A 2 = 0. 
3n in 

The first term in the sum comes from the two colors which could be chosen 
at vertex i, causing coupling. The second term comes from the one bad color 
which could be chosen at i's neighbor, causing one of the height functions 
to change by 2. 

Suppose i £ {2, n — 1}. Then E[c?2(o' / , r')] — 1 is equal to 

2 2 1 

— Ai - — A 2 + — A 3 = 0. 

in 6n on 

Suppose i G {3, . . . , n — 2}. Then Efc^fV, t')] — 1 is equal to 
1 2 1 

^-Ai_i - — Ai + — A i+ i = 0. □ 
on on on 

We can conclude from Lemma 4 by path-coupling that, if we take any 
(cx, tJgSIxU and produce (a 1 , r') by one step of the identity coupling, then 
E[d 2 (a',r')]<d 2 (a,T). 

4.3.2. Lower bounding V . Let w = min^ Aj = 5. Start with a and r such 
that a 7^ r. We will identify a vertex z and a color C such that, if we obtain 
a' from a by trying C at z and we obtain r' from r by trying C at z, then 
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d2(c',T') < (^2(0", t) — Since (z, C) is chosen with probability l/(3n), we 
get V = w 2 /(3n). 

Our method is this. Given o and t, choose h £ ~H(a) and /i* € W(r) such 
that ^2(0", r) = d(/i, /i*). Construct h! from /i by applying the choice (z, C) (to 
be specified presently) in h and construct h'* from /i* by applying the same 
choice (z, C) in h* . We will show that d(h', h'*) < d(h, h*) - w so d 2 {o-', t') < 
d{h',h'*)<d 2 {cj,T)-w. 

Without loss of generality, assume that there is a vertex v such that h v > 
h*. Let m = max« h v — h* v > and let i? = {u| — h* v = m}. By construction, 
R is nonempty. 

Case 1. R is the whole line. Let z be any local maximum in h and let 
C be the color that is not used at z or at its neighbors in h. z is also a local 
maximum in h* (since R is the whole line), but C is used either at z or at 
its neighbors in h* . (The unique color C' that is not used either at z or its 
neighbors in h* must be different from C, since a 7^ r.) Choose (z, C). Then 
h' z = h z -2. But h'* = h* z .So d(h,h*) - d(h',h'*) = \ z . 

Case 2. There is a vertex z € R, all of whose neighbors are in R. Note 
that all edges from z to R in h go down (i.e., height decreases along these 
edges). Also, all edges from z to R in h* go up. Thus, z is a local maximum 
in h and a local minimum in h*. Let C be the color that is not used at z 
or at its neighbors in h. Choose (z, C). Then h' z = h z — 2. Since z is a local 
minimum in h* , h'* > /i*. Also, h' z > /i^* since we choose the same color in 
both copies. Thus, d(/i, h*) — d(h', h'*) > X z . 

Case 3. There is a vertex z G i? which has a neighbor w £ R and a neighbor 
r G R. Note that the edge from z to r goes the same direction (up or down) 
in h as in h* . Suppose first that it goes down. Then z is a local maximum in h. 
Let C be the color that is not used at z or at its neighbors in h. Choose (z, C). 
Then h' z = h z — 2. Also, h'* = h* (since z has a neighbor below and a neighbor- 
above, and won't be recolored in h*). Thus, d(h, h*) — d(h' , h'*) > X z . 

Suppose instead that the edge from z to r goes up. Then z is a local 
minimum in /i* . Let C be the color that is not used at z or at its neighbors in 
h* • Choose (z,C). Then h'* = h* z + 2 and h' z = h z so d(/i, h* ) - d(/t', tf* ) > X z . 

5. Systematic scan for q = 3 mixes in ©(n 2 logn) sweeps. As in Sec- 
tion 4 we consider the path G with vertices 1 through n with q = 3 colors. 
We consider the dynamics .M— 

5.1. Analysis of a related Markov chain. As in Section 4.1 the Markov 
chain M—> can be associated with a Markov chain A4±> on T. Each move 
of starts with a configuration l£l and makes n moves of the chain 
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Mqi from Section 4.1 corresponding to the choices r = l,r = 2, . . . . 
order). 

Consider the transition from configuration X to configuration X' corre- 
sponding to one step of A4^. Let X{ denote the label (±1) of vertex i in 
the intermediate configuration which is obtained after the choices r = 1, r = 
2,...,r = i. Then 



n m 



E[Xl] = \X\, 

= \Xi + |E[Xj_i], 

E PC-i] = |E[X n _i]. 
Solving these gives 



2,.. 
1,.. 



,n 
,n 



Eft'] = 
So the matrix 



\X\ + gX 2 , 



2 » 4 



3 <+i 



1 



J"=2 
n-2 



3i+2 ^Xj + -X i+1 , 



,n 



on 1 Z-^i 

6 3=2 



2 2 
3 n+i-j X i + g 1 "" 1 ' 



.4. 



9 
2 

27 
2 

81 



3 
4 

9 

4 

27 




1 
3 



2 


4 


4 


4 


4 


n— 1 


3 n-2 


3 n-3 


3 n-4 


9 


1 


2 


2 


2 


2 


3" 


3 n-l 


3 n-2 


3 n-3 


27 





1 

3 
2 

9 

Here A is not symmetric, but is nonnegative and irreducible, so has a positive 
eigenvector w corresponding to its (real) largest eigenvalue A. Now w,X 
satisfy the equations 

\W! = -Wi + £ —^Wj + -IUn-1, 



J =2 



3J+1 
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1 ^ 2 4 2 

\w { = -Wi-i + ^ gjz^uy + 3^+t^-i, i = 2, • . • ,n - 2, 

j=i 

Xw n -i = \w n -2 + f^n-l- 

These can be simplified by subtracting one-third of the (i + l)st equation 
from the ith for i = 2,...,n — 2, and one-sixth the second from the first, 
giving 

(9) Xw 2 - (6A - l)wi = 0, 

(10) Xwi + i — (3A — l)wi + Wi-i = 0, i = 2, ...,n — 2, 

(11) -3w n _ 2 + (9A - 2)iu„_i = 0. 

If A is close to 1, the form of (10) suggests a slightly damped harmonic 
oscillation, so we will try a solution of the form Wi = Cne 71 sin(m + (3), where 
c n > is a constant, depending on n, that can be chosen later. Substituting 
this in (10) and equating coefficients of sin(m + 0), cos(ai + 0) gives 

A = e -27 and cos a = (3e~ 7 — e 7 )/2, 

(12) . , ^ 

that is, e 7 = v 3 + cos 2 a — cos a. 

[The second of these follows from sin(x + y) = sin x cosy + cos x sin y and 
sin(a; — y) = sin x cosy — cos x sin y and the third used the quadratic formula 
with the choice cos a > 0.] Then (9) and (11) give 

(13) - a n) = cos a + cot (a + f3) sin a = 6e~ 7 — e 7 
sin (a + pj 

and 

sin((n-2)a + /3) „. . 9e" 7 - 2e 7 

(14) — — — = cos a — cot((n — lja + p) sin a = . 

sin((n — l)a + p) 3 

Using (12) to eliminate 7 in (13) and (14) gives 

sin a 



tan(a + p 1 ) 



and 

implying 
(15) 



2 cos a + + cos 2 a 
—3 sin a 



tan((n - l)a + (3) = - — 

2 cos a + v 3 + cos z a 

tan((?i — l)a + /3) = — 3tan(a + /?) and 

tana = 4tan(a + /?)/(!- 3 tan 2 (a + /?)). 
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To see the second of these equalities, note that the left-hand equation on 
the previous line is equivalent to 

tan(o + p) = = 

2 cos a + v 4 cos 2 a + 3 sin a 

using cos 2 a + sin 2 a = 1 . But this is equal to 

tana 
2 + \/4 + 3tan 2 a' 
Now solve this for tana. The equalities in (15) imply 
it — (n — l)a — (3 = arctan(3tan(a + /?)), 

a = a + j3 + arctan(3tan(a + /3)). 

The first of these uses tan(7r — x) = — tan(x) and the second uses tan(x + y) = 
(tanx + tany) /(l — tanx tany), with x = a + (3 and y = arctan(3tan(a + /3)). 
So finally we have 

7T 



tan /? = —3 tan (3 H 

V n — 1 

e7 = \/ 3+cos2 (^t)- cos (^t)- 

Note that /3 is the solution of a trigonometric equation, but it is easily 
checked that — n/ (n — 1) < ft < 0. Hence, u; > 0, corresponding to the largest 
eigenvalue A of A. Asymptotically, we have 

(16) a~7r/n, /3~— 37r/4n, 7~7r 2 /4n 2 , so 1 — A ~ 7r 2 /2n 2 . 

We also need to set c n ~ 4n/7r to achieve io^ > 1, for all < i < n. If we take 
X(0) to be the all l's vector, then it is easy to check that $o ~ 8n 2 /7r 2 . 

Next we need to estimate p, the bound on the variance of <3?t, given &t—i- 
In the case of Glauber dynamics, the range of <3?t was 0(1), which provided a 
crude bound p = 0(1). For scan, however, the range of possible values of <&t 
is O(n), which yields only p = 0(n 2 ): too weak for our purposes. Intuitively, 
however, since &t is, roughly speaking, a sum of n nearly independent r.v.'s 
each of variance O(l), the variance of &t ought to be 0(n). This is indeed the 
case. In fact, we prove something stronger in the form of a large deviation 
result for Before doing that, let's complete the remainder of the proof. 
Assuming p = 0(n), we have v = 0(n 3 ). Now, from (3), the lower bound 
is Mix(A / f^ > , ^) > 7r~ 2 n 2 Inn — 0(n 2 ). Since the sweep is also monotone, (5) 
gives the upper bound Mix(A4z^,e) < 47r" 2 n 2 lnn + ^-lne -1 + 0(n 2 ). It 
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may be observed that the bounds for -Mqj are both about n times these 
quantities, so there is no evidence that the scan gives a significant speed-up. 
However, there will be a considerable saving in random number generation. 

It only remains to show p = 0(n). Recall that p is an upper bound 
on E[var($t|*t-i)], where § t = wX(t). Also, for i E {1, . . . , n — 1}, Wi = 
(4n/7r) exp(7i) sin(m + ff), where 7, a and (5 are given asymptotically in 

(16) . Let wq = Wn = 0. Our first observation, which is similar to the one 
used in the analysis of Glauber dynamics, is 

(17) max \wi — = Oil). 

l<i<n 

To see that (17) holds, first note that 1 < exp(7i) = 1 + 0(l/n). Using the 
series expansion of sine, we find that w\ = 0(1) and w n -\ = 0(1). Now, for 
i E {2, . . . , n — 1}, note that 

Wi — Wi-i < — (1 + 0(l/n)) sin(az + 0) sin(a(i — 1) + (3) 

7T 7T 

4n 

< 0(1) sin(ai + (3) H (sin(m + (3) - sin(a(i - 1) + /?)). 

7T 

The first term is O(l) because sine is bounded. Since the derivative of sine 
is at most 1 (in absolute value), the difference between the two sines in the 
second term is at most a, so the second term is also 0(1). 

Let ui,U2, ■ ■ ■ , 0J n denote the sequence of swap/no-swap decisions made 
by systematic scan in transforming X{t — 1) to X{t). That is, u)\ is the 
indicator r.v. for the event that the sign of position 1 is flipped, U{ (for 
i E {2, . . . , n — 1}) is the indicator r.v. for the event that positions i — 1 and 
i are exchanged, and to n is the indicator r.v. for the event that the sign of 
position n is flipped. The Ui's are independent Bernoulli random variables 
with parameter 1/3. Given X(t — 1), the configuration X(t) is a r.v. in 
(jji,u>2, • • • , oj n . Let uj n+ i = 0. Consider the Doob martingale Zq, Zi,...,Z n 
obtained by revealing the swap/no-swap decisions in sequence: 

Z = B[wX(t)}, Z 1 = E[wX(t)\u 1 ], 

Z 2 = E[wX(t)\ui,u} 2 ], ...,Z n = E[wX(t)\uuU2, ■ ■ -,u) n \. 

All of Zq, . . . , Z n are conditioned on X(t-1). Notice that Z = E[<$> t \X(t- 1)] 
and Z n = <!>£. We will show below that \Zi-i — Zi\ = O(l), for all 1 < i < n. It 
follows from the Azuma-Hoeffding inequality [4] (see also [21], Chapter 2.4) 
that 

Pr(|$ f -E[$t]| >hV^)=Pr{\Z n - Z \ > /i^) < exp(-fi(/i 2 )). 
Let C = maxj |Zj_i — Zi\. Then we get p = O(n) since 
E[var($ t |$ t _i)] 
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= £ Pr(X(t - 1) = £)E[(* t - E[<f> t \X(t - 1) = ^]) 2 |X(i - 1) = £] 

< maxE[($ ( - E[$ t |X(* - 1) = £]) 2 |X(t - 1) =£] 
= maxE[(Z n -Z ) 2 |X(i-l)=£] 

<nmax ^ /i 2 Pr (|Z„ - Z \ € ((/» - l)Vn, /i\Z")l^(* - 1) = 

revni-i 

<n ^ (/i + l) 2 exp(-0(/i 2 )) = 0(n). 

Finally, we must argue that \Zi—\ — Z<\ = O(l). First, note that 
|Zi - Z | = |E[wX(f)|o;i] - E[wX(f)]| 

< |E[«7X(i)|wi = 1] - E[wX(t)|o;i = 0]|, 

and the right-hand side is at most 
n-l 

£Pr((w 2 ,...,w fc+2 ) = (l,...,l,0)) 

fc=0 

x (E[^X(t)|l,l,...,l,0]-E[u;X(t)|0,l,...,l,0]) , 

where the conditioning specifies the values of cui, . . . ,uJk+2- The relevant 
probability is at most 3 _fc . To get an upper bound, we move the absolute 
value inside the summation and maximise over LVk+3 > • • • , w„, obtaining 

n-l 

\Zi-Z \ < V3~ fe max |E[w;X(i)|l, 1, . . . , 1,0,^+3, • • • ,w n ] 

-E[wX(t)|0,l,...,l,0,cj fc+3 ,...,w n ]| 

= ^3- fe 2u; fc+1 , 

k=0 

since the difference in sign propagates to position k + 1 and then stops. By 
(17), this is 0(1). Similarly, \Z n — Z n -±\ = 0(1). Now consider i 6 {2, . . . , ri- 
ll. Mimicking the analysis above, we find that \Z\ — Zi—\\ is at most 

n— i 

V 3~ k max T, 
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where T is the absolute value of 

B[wX(t)\uJi, . . .,u)i-i, 1, 1, ... , l,0,Wj + fc +2 , . . . ,oo n ] 

- E[wX(t)\(Vx,. . .,UJi-X,0, 1, . . . , l,0,Wi + fc+2,- ■ -,Un], 

which is at most 2\wi + k — Wi-i\, so 

n—i 

\Zi - Zj_i| < ^2 3~ k 2\w i+k - Wi-i\, 

fe=0 

which is 0(1) by (17). 

5.2. A lower bound for systematic scan. We will use distance measures 
d\ and d 2 from Section 4.2. Using distance measure d±, the lower bound 
from Section 5.1 applies directly to M^. Thus, we have the following: 

Theorem 5. Let G be the n-vertex path, and let q = 3. Then a lower 
bound on the mixing time of the Markov chain Ai—, on the state space Q, is 
given by Mix.(M->, \) > -K~ 2 n 2 lnn — 0(n). 

5.3. An upper bound for systematic scan. As in Section 4.3, we find that, 
for some t' = 0(n 2 logn), we will have di(a(t'),r(t')) = with probability 
at least ||. 

Now choose the following weights. Let X\ = j, A2 = • • • = A n _i = 1 and 

X - 3 
*n — 4- 

We now use path coupling on pairs (<r(0),r(0)) G S. Start with such 
a pair, run t' steps to get (a(t'),T(t')). With probability at least 39/40, 
dx{a{t'),r{t')) = 0. Either u\t') = r{t') or d 2 (o-(t'),r(t')) = n-l. Suppose 
the latter. We now carry on from (a(t'),r(t')) using the identity coupling. 
We show in Section 5.3.1 that, if we take any (a, r) G SI x £1 and produce 
(o"',r') by one scan using the identity coupling, then Efc^c', t')\ < ^2(0", r). 
Thus, Df = d2(o~(t' + t),r(t' + 1)) is a super-martingale with Dq = n — 1. In 
Section 5.3.2, we define V = 1/27 and show that, for all t and all values of 
D t other than 0, E[(A+i - A) 2 | A] > V. Let B = lOn, and let T be the 
first time at which either (a) Dt = (i.e., coupling occurs), or (b) Dt> B. 
Note that T is a stopping time. Define Z% = (B — Dt) 2 — Vt, and observe, as 
in Section 4.3, that Zt/\T is a sub-martingale. Let p be the probability that 
(a) occurs. As in Section 4.3, applying the optional stopping theorem to Dt 
gives p> -jq. Also, as before, 

pB 2 + (1 -p)E[(B - D T ) 2 \D T >B]- VE[T] > 0. 

Since \D t - D t ^\ <2n, (I- p)E[(B - D T ) 2 \D T > B] < \n 2 < pB 2 so E[T] < 
(2pB 2 ) /V . Conditioning on (a) occurring, it follows that E[T|Ar = 0] < 
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2B 2 /V. Hence, Pr(T > 20B 2 /V\D T = 0) < ^. So, if we now run the identity 
coupling for 20B 2 /V = 0(n 2 ) steps, then a and r will fail to couple with 
probability at most g + 2x^<|. Thus, we have shown the following: 

Theorem 6. Let G be the n-vertex path, and let q = 3. Consider the 
Markov chain on the state space Q,. Then Mix(.A/f^, I) = 0(n 2 log n). 

We can bound Mix(.M^,£) for e < 1/4 by boosting the coupling proba- 
bility in the usual way. 

5.3.1. The coupling breaks even. Recall that the vertices of the path G 
are labeled 1, . . . , n going from left to right. 

Lemma 7. Suppose that a and r differ at vertex i <n and agree to the 
right of vertex i. Obtain a' and t' by scanning left to right, starting at vertex 
i + 1, doing the identity coupling. Then 

E{d 2 {a\r')}-d 2 {a,T)<\. 

Proof. Choose h G H(a) and h* G H{t) such that d 2 (a, r) = d(h,h*). 
Let h' , hi* be the transformed height functions produced by the scan. If v = 
i + £ for £ E {1, . . . ,n — i — 1}, then the probability that vertex v is changed 
by the coupling is {^Y- If there is a change, then one of the height functions 
changes by 2, so the change in d(h' , h'*) is 1. For v = n, the probability that 

v changes is (|) n 1 |. The change to d(h',h'*) in this case is |. Thus, 
E[d 2 (a',r')]<E[d(h',h'*)] 

v ' ; 1-1/3 2 V3 

= d(M*) + -. □ 

Lemma 8. Suppose that a and t differ only at vertex 1. Obtain a' and 
t' by scanning left to right, starting at vertex 1, doing the identity coupling. 
Then 

E[d 2 (a'y)]<i 

Proof. With probability |, the first vertex agrees, so a' = t' . With 
probability |, the first vertex is left unchanged. Thus (using Lemma 7), 
E[d 2 {a',r')]<\{\ + \). □ 
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Lemma 9. Suppose that a and r differ only at vertex 2. Obtain a 1 and 
r' by scanning left to right, starting at vertex 1, doing the identity coupling. 
Then 

E[d2(oV)] < 1. 

Proof. Say that a starts 202 and r starts 212. Consider the coupling 
of first vertex: 

• With probability |: 

The first vertex is made to disagree, for example, a now starts 202 but 
r starts 012. 

The coupling of the second vertex in this case is as follows: 

- With probability |: 

The second vertex is made to agree, for example, both become 1. In 
this case, F,[d 2 (o-' ,t')] = ~. 

- With probability |: 

The second vertex is unchanged. In this case, 

7 

4" 

• With probability |: 

The first vertex is unchanged. By analogy to the proof of Lemma 8, 

E[d 2 (aV)]< !(i +!) = i 

Adding it all up, E[d 2 (a',T')] < §(§ • \ + § • \) + \ ■ \ = 1. □ 

Lemma 10. Let 2 < i < n. Suppose that a and r differ only at vertex 
i. Obtain a' and r' by scanning left to right, starting at vertex 1, doing the 
identity coupling. Then 

E[d2(oV)] < 1. 

Proof. Say that vertices % — 2, . . . , i + 1 of a are 1202 and of r are 
1212. 

Consider the coupling of vertex i — 

• With probability |: 

Vertex i — 1 is made to disagree, so a now starts 1202 but r starts 1012. 
By analogy with the proof of Lemma 9, Eft^c', r')] < „ • 1 + 1(1 + 1 + 1) = 
2. 

• With probability |: 

Vertex i — 1 is unchanged. By analogy with the proof of Lemma 8, E[d2(cr' , 
r')} < 1(1 + 1) = \. 

Adding it all up, E[d 2 ((r' ,t')] <±-2 + §- ± = l. □ 
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Lemma 11. Suppose that a and r differ only at vertex n. Obtain a 1 and 
t' by scanning left to right, starting at vertex 1, doing the identity coupling. 
Then 

B[d 2 (a',r')]<l 

Proof. Say that vertices n — 2, . . . , n of a are 020 and of r are 021. 
Consider the coupling of vertex n — 1 : 

• With probability |: 

Vertex n — 1 is made to disagree, so a now ends with 010 and r with 
021. 

The coupling of vertex n (the last) in this case is as follows: 

- With probability |: 

The last vertex is made 0, with resulting cost 1. 

- With probability \\ 

The last vertex is unchanged, with resulting cost 1 + | = |. 

- With probability |: 

The last vertex becomes 2 in a, with resulting cost 1 + 2 • | = I. (The 
claimed final cost is witnessed by the sequence of transitions 1 2 — ► 
010->020->021.) 

• With probability |: 

Vertex n — 1 is unchanged. Now with probability |, vertex n will agree and 
with probability it will be unchanged. Thus, E[d 2 (a' , t')} <\-\ = \. 

Adding it all up, E^Kr')] < \(\ ■ 1 + \ ■ \ + \ ■ §) + § • \ = f . □ 

Lemmas 8, 9, 10 and 11 show that if (a, r) G S 1 and we obtain a' and r' 
by scanning left to right, starting at vertex 1, doing the identity coupling, 
then Efc^o"', r')] < d2(o~,r). By path coupling, we find that if we take any 
(<t, t) Gflxfi and we produce (a', r') by one scan using the identity coupling, 
then B[d 2 (o-',T')]<d 2 {a,T). 

5.3.2. The coupling has enough variance (lower bounding V). Recall that 
w = minj Aj. Suppose a ^ r. In Section 4.3.2, we considered several cases. 
For each case, we identified a vertex z and a color C such that, if we obtain 
a' from a by trying C at z and we obtain r' from r by trying C at z, then 
d 2 {o-' ,t') < d 2 (a,r) - w. 

In this section we reconsider each case. Obtain a* and r* by scanning 
<7 and r left to right, using the identity coupling. For each case in Section 
4.3.2, we prove the following: 

• If z > 1, then there is a color q (depending only on a z -i, a z , r z _i and 
t z ) such that choosing color q for vertex z — 1 ensures ct^_j = o~ z —\ and 
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Case 


(Tz_ i • ■ • (T z -\-\ 


Ti_l ■ ■ • T z + l 


c 


c r (0) 


Cr(l) 


Cr(2) 


1 


010 


121 


1 





1 


2 


1 


010 


202 








1 


2 


2 


010 


101 











2 


2 


010 


212 


1 





1 


2 


2 


010 


020 











2 



r*_ x = T^-i. Actually, it is easy to see that C£ exists — just take any color 
in {a z -i,a z } n {r z _i,r z }. 

• Suppose z <n. For any color c, there is a color c r (c) (depending only on 

cr z , cr z+ i, t z -x, t z , t z+ \ and c) such that if we choose q for vertex 
z — 1, c for vertex z and c r (c) for vertex z + 1, then <7* +1 = cr z +i and 

T* +1 = T Z+1 . 

• There is a color C such that, if we obtain a' from cr by trying C at z 
and we obtain r' from r by trying C" at z, then <7 Z = <7 Z and t' z = t z . 

This is enough to establish V = 1/27. We will consider the event that C£ 
is chosen for z — 1 and, whatever color, c, is chosen for z, c r (c) is chosen for 
z + 1. This event occurs with probability 1/9. Conditioned on the fact that 
this event occurs, we can choose the color c for vertex z after choosing all 
other colors. That is, the choice of c is independent of the rest of the scan. 
Let cp and be random variables defined by a left to right scan of a and 
r, which uses q at z — 1 and c r (c) at z + 1 and misses out the re-coloring 
at z. 

If | c?2 (0° j ' ) — ^2(0") t)| > to/2, then we choose color C" for vertex z so a* = 
a> and r* =r'. Otherwise, we choose color C for vertex u so c?2 (°"* , r * ) < 
^2(0"', r') — u>. Either way, we get \d,2(cr*, r*) — ^2(0", r)| > to/2. 

Cases 1 and 2 from Section 4.3.2 are in Table 1. 

In Case 3, say cr z _i • • • cr z +i = 010 and r 2 _i • • • t 2 +i is monotonic. Then 
C is any color in {0, 1}, c r (2) is any color in {0, 2} n {t z ,t z+ x} and, for any 
i ^2, c r (i) is any color in {0, 1} n {t 2 ,t z+ i}. 

The case where r z _i • • • r z+ i = 101 and a z -x • • ■ cr z+ x is monotonic is sim- 
ilar. 

6. Optimal mixing of Glauber and scan when q = 4. 

6.1. Distance measures. In this section G is the n-vertex path. We take 
the state space to be Q + (i.e., all colorings, whether proper or not). The 
results that we get by analyzing our Markov chains on state space fi + also 
apply to the same chains with state space f2 — this is because the chains 
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do not make transitions from states in f2 to states outside of f2. (Thus, the 
stationary distribution is uniform on f2 - states in f2 + \fi have zero measure.) 
We ought to note that, on the extra states in J7 + \ f2, what we are calling a 
"Metropolis" update does not strictly fit the official definition. For example, 
with a natural definition of the "energy" of a coloring, and using the usual 
Metropolis filter, the transition ...001... — > ...Oil... would occur with 
positive probability. Nevertheless, we disallow this transition because of the 
adjacent color 1 vertices in the final state. However, our version of Metropolis 
agrees with the usual one on the significant part of the state space, namely, 

n. 

6.2. Glauber with q = 4: O(nlogn) updates suffice. We'll use Theorem 
2.2 of [13]. 

Suppose (<t, t) £ S differ on vertex i. Construct (ct',t') from (a, r) by 
using the following coupling. Choose the same vertex v to recolor in a and 
in r. Choose the same color in both copies unless v £ {i — + 1}. In that 
case, choose color <7j in one copy, while choosing T{ in the other (and choose 
the same color otherwise). 

We first show that the value j3 in Theorem 2.2 of [13] is 1. That is, we 
show that E[Ham(cr', r')] < 1. Consider the choices made in a. If we choose 
vertex i — 1 and color n, then the Hamming distance might go up by 1. 
Similarly, if we choose vertex i + 1 and color Tj, then the Hamming distance 
might go up by 1 . If we choose vertex i and any of the (at least two) colors 
not in {<Tj_i,o"j-fi}, then the Hamming distance goes down by 1. These are 
the only choices which can cause the distance to change. 

Now consider a multi-step coupling from (cr, r). Assume for now that 
i E {3, . . . , n — 2}, so there are at least 2 vertices to the left of vertex i and 
at least 2 vertices to the right of vertex i. The other cases are easier and 
we will consider them later. Let c be a color which is not in {<Ji,Ti} (there 
are two such colors, but choose an arbitrary one and call it c). Let be the 
set containing the following 6 choices (in a): choose i with any color, choose 
i — 1 with Tj, or choose i + 1 with Tj. Let the stopping time T be the first 
time a choice from ^ is made. [I.e., a choice from ^ is made in the transition 
from (<r(T- 1),t(T- 1)) to (<t(T),t(T)), where (<r(0), r(0)) = (a, r).] 

Let 5 be the set containing the following 14 choices (in a): Choose i — 1 
with any color besides r%. Choose i + 1 with any color besides Tj. Choose 
i — 2 with any color. Choose % + 2 with any color. Let C be the set containing 
all 4n — 20 choices that are not in ^ or S. Let Zi, . . . , Zt denote the choices 
made (in a) in the transitions (<r(0), r(0)), (c(l), r(l)), . . . , (cr(t), r(t)). We 
will say that the sequence z±, . . . ,Zt is good if the only choices in SUW are 
the following: 

• for some t\ € [l,t], zt x consists of vertex % — 2 along with the "smallest" 
color (e.g., smallest numerically) that is not in {c, ai-^{t\ — l),cr,;_i}, and 



26 M. DYER, L. A. GOLDBERG AND M. JERRUM 

• for some ti G [t\ + 1, t] , zt 2 consists of vertex i + 2 along with the smallest 
color that is not in {c, a^siti — l),crj+i}, and 

• for some t 3 G [i 2 + 1,*], «t 3 = (i — l,c), 

• for some £4 S [£3 + l,t], Zi 4 = (i + l,c). 

Denote by Q the event that 21, ... , zt~\ is good. Now, 

Let a be any positive constant which is at most 1/6. Let (5 be a positive 
constant, independent of n, such that, for all t £ [an,n], the expression in 
(18) is at least 5. Now 

r t t , /rrn /m vM m , 3x0 + 1x1 + 2x2 5 

E Ham(<r(T),r(r)) T = t and Q] < = -. 

D fa 

In particular, Ham(cr(T), r(T)) = 1 if zt = (i>c) and Ham(<r(T), r(T)) = if 
Zt consists of vertex i with some other color. Otherwise, Ham(cr(T), r(T)) < 
2. Similarly, 

E[Ham(a(T),T(T))|T = t and -n0] < 2x0 + 2x1 + 2x2 = 1, 

6 

so 

E[Ham(cr(r),r(r))|T = t] 

= Pr(g\T = t)E[K&m(a(T),T(T))\T = t and 0] 

+ Pr(^a|r = t)E[Ham(a(T),r(r))|r = t and -.0] 
<Pr(0|r = *)(l-±) + (l-Pr(0|T = t)) 
= l-±Pr(£|T = t). 
Thus if i S [an, n], 

Epam(<T(r),r(r))|r = t]<l-^. 

fa 

Finally, 

n 

E[Ham(<x(T), t{T))\T < n] = ^ E[Ham(<7(T), r(T))|T = t] 

t=i 

x Pr(T = t\T< n) 
< 1- - ^ Pr(T = t|T<n) 

(=cm 

<i-£i;pr(r=t). 

i=on 
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Since a < 1/6, we have 

/4n-6\ an ( 6\ m 6a 1 

Pr (T<cm) = !-(—) =l-(l" S ) < T <4- 

Also, 

^>«)=(^)"=(l-|;)"se XP (- 6 /4)<i. 
Thus, (19) gives 

(20) E[Ham(ff(r),r(T))|r < n] < 1 - A. 

Now Theorem 2.2 of [13] tells us that 

E[Ham(o-(n), r(n)) - 1] < Pr(T < n)(E[Ham(cr(T),T(T))|T < n] - 1), 
and by (20), this is at most —5/12 so 

E[Ham(<x(ri),T(n))]<l-— . 

By the "delayed path coupling lemma" of Czumaj et al. (Lemma 2.1 of [13]), 
the mixing time satisfies 

Mrx(M G i,e)< ^ -n. 

In the preceding argument, we assumed that i G {3, . . . , n — 2} so that 
vertices i — 1, i — 2 and i + l,i + 2 all exist. The argument still goes through 
if i has fewer neighbors to the left (or right). In that case, we just modify the 
argument by changing the definition of "good" so that it doesn't mention 
vertices that don't exist. 

Thus, we have proved the following: 

Theorem 12. Let G be the n-vertex path, and let q = 4. Consider the 
Markov chain A4q\ on the state space + . Then Mix(A / ici) e ) ^ ^nlog(ne _1 ), 
where 5 is the constant mentioned above. 

6.3. Systematic scan for q = 4: O(logn) sweeps suffice. We will only de- 
fine the coupling for pairs (er, r) G S. Each such pair disagrees at a single 
vertex i. Thus, when we come to re-color a vertex j during the scan, at most 
one of {j — 1, j + 1} has a disagreement. The coupling that we will use is 
as follows. If vertex j is not adjacent to a disagreement, then we use the 
same colors in both copies. On the other hand, if (say) vertex j — 1 has a 
disagreement, then we couple the choice of <7j_i for aj and Tj_i for Tj and 
we couple the choice of Tj_i for o~j and Cj-i for Tj. Otherwise, we choose 
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the same color in both copies. The coupling if j + 1 has a disagreement is 
similar. 

In the following sequence of lemmas, we let % denote the rightmost vertex 
where there is a disagreement between the colorings a and r. Lemmas 13 
through 16 are valid for any q > 4, and we state them in terms of q so that 
we can re-use them later for q > 4. The first lemma, Lemma 13, analyzes a 
scan starting from vertex i + 1. 



Lemma 13. Suppose that a and r differ at vertex i <n and agree to the 
right of vertex i. Obtain a 1 and t' by scanning left to right, starting at vertex 
i + 1. Then 

E[Ham(cj / ,r / )] - Ham(cr,T) < — *— . 



Proof. If z = i+£ for £ G {1, . . . , n — i}, then the probability that vertex 
z becomes a disagreement after the recoloring is (1/q) ■ Thus, the expected 
number of additional disagreements is 

IX 1 fl\ 2 /l\ n ~ i 1 1 1 



g" 1-1/9 q-l' D 
The next two lemmas analyze a scan starting from vertex i. 



Lemma 14. Suppose (a, r) G S differ on vertex i. Let C = |{<7i-i,0i+i}|. 
(C is the number of colors that are used at neighbors of i in coloring a.) 
Obtain a 1 and r' by scanning left to right, starting at vertex i. Then 
EfHanuVy)] <C/(q- 1). 



Proof. Consider the recoloring of vertex i in copy a. With probability 
1 — C/q, the chosen color is not in so Ham(cr / ,r / ) = 0. On the 

other hand, no matter what color is chosen for vertex i, Lemma 13 guarantees 
that (conditioned on this choice) E[Ham(cj / , r')] < 1 + l/(q — 1). Thus, we 
have 

E[Ham(a', r')] <-(l + -?-) = Q 
q \ q-lj q-l u 

Lemma 15. Suppose colorings a and r differ just on vertices i — 1 and 
i. Obtain a' and t' by scanning left to right, starting at vertex i. Then 

E[Ham(o',r')] < 1 + — r- 
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Proof. Consider the recoloring of vertex i in copy a. With probability 
at least 1 — 3/g, the chosen color is not in {<7j_i, Tj_i, Cj+i}, so Ham(cr', r') = 
1. On the other hand, no matter what color is chosen for vertex i, Lemma 
13 guarantees that E[Ham(cr / , r')] < 2 + l/(q — 1). Thus, we have 

E[Ham(o"', r')] < (l - -) • 1 + - (2 + — ^ , 

which simplifies to the claimed upper bound. □ 

The next three lemmas analyze a scan starting from vertex max{l,i — 1}. 

Lemma 16. Suppose (a, r) £ 5 differ on vertex i. Obtain a' and t' by 
scanning left to right, starting at vertex max{l,i — 1}. Then 

EfEamOV)] < ^-y. 

Proof. If i = 1, then Lemma 14 with C = 1 shows E[Ham(V, r')] < 
l/(g — 1), which is at most the expression given in the statement of the 
lemma. Suppose i > 1. Consider the recoloring of vertex i — 1 in copy a. With 
probability 1/q, color n is chosen. By Lemma 15, E[Ham(cr / , r')] < l + 3/(g — 
1). Otherwise, cr^! =r J '_ 1 , so Lemma 14 guarantees that E[Ham(ff',r')] < 
2/(g-l). Hence, 

E[Ham(a', r')] < I (l + -L-) + (l - I) J- = * 
q\ q-lj \ q/ q-l q-l 

as claimed. □ 

For the rest of this section, we restrict attention to the case q = 4, which 
corresponds to the "break even" situation in Lemma 16. 

Lemma 17. Suppose (er, r) G S differ on vertex i <n. Suppose that ai + \ ^ 
{cjj, Tj, Cj_2}. Obtain a 1 and t' by scanning left to right, starting at vertex 
max{l,i-l}. Then E[Ham(a', r')] < 

Proof. If i = 1, then the lemma follows from Lemma 14 with C = 
1. Suppose i > 1. Consider the recoloring of vertex i — 1 in copy a. With 
probability |, color a, l+ i is chosen. The same color is chosen in copy r, and 
Lemma 14 with C = 1 guarantees that E[Ham(cr', r')] < ^. With probability 
7j, the color chosen for vertex i — 1 is not in {o"j + i,Tj}, so cr^ = r[_ x . By 
Lemma 14 with C = 2, E[Ham(cr / , r')] < |. Otherwise, Lemma 15 guarantees 
that E[Ham(a / , r')] < 2. Thus, we have E[Ham(a / , r')] <i-| + i- | + i- 2 = 
12 • LJ 
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Lemma 18. Suppose (a, t)£S differ on vertex i < n. Suppose that crj+i = (Tj. 
Suppose that o~i ^ o~i-2 and n ^ o~i_2- Obtain a' and t' by scanning left to 
right, starting at vertex max{l,£ — 1}. Then E[Ham(c/, r')] < 

Proof. If i = 1, then the lemma follows from Lemma 14 with C = 1. 

Suppose £ > 1. Consider the recoloring of vertex £ — 1. With probabil- 
ity color Tj is chosen in copy a and o~{ is chosen in copy r. Both of 
these choices are accepted. In this case, consider the recoloring of ver- 
tex £. With probability ^, the color chosen is not in {o~i,Ti} and is ac- 
cepted in both copies, leaving Ham(cr / , r') = 1. Otherwise, by Lemma 13, 
E[Ham(ff',r')] — |- Thus, conditioned on this color choice for vertex i — 1, 
we have E[Ham(cr / , r')] <5"1 + |'3 = §- F° r anv other choice at vertex 
£ — 1, Lemma 14 guarantees that E[Ham((r',r')] < |. We conclude that 

E[Ham(a',T')]<H + i-§ = li D 

For the remaining lemmas, we analyze a scan starting from vertex 1. 
These three lemmas imply the result. 

Lemma 19. Suppose (o~,t) G S differ on vertex n. Obtain a' and r' by 
scanning left to right, starting at vertex 1. Then E[Ham(ff',r')] < yi. 

Proof. Consider the recoloring of vertex n — 1 in coloring a. With 
probability ^, color r n is chosen. In this case, Ham(<7 / ,r / ) < 2. Otherwise, 

<-i = <-i so E[Ham(a',r')] < \- Thus, E[Ham(a',r')] < \ ■ 2 + f • \ = 
□ 

Lemma 20. Suppose (a, t) £ S differ on vertex i <n. Suppose that ai + \ ^ 
{o~i,Ti}. Obtain a' and t' by scanning left to right, starting at vertex 1. Then 
EpanuVy)]^ 47/48. 

Proof. If i < 2, then the lemma follows from Lemma 17. Suppose £ > 2 
and consider the recoloring of vertex £ — 2. With probability |, the color that 
is chosen is the first color that is not in {cr^_ 3 , (Tj-x, Oi+i}. This is accepted 
so Lemma 17 guarantees that E[Ham(cr / , r')] < j^. Otherwise, Lemma 16 
guarantees that E[Ham(cr / , r')] < 1. Putting this together, E[Ham(cr / , r')] < 

1.11 + 3.1 = 11 □ 
4 12 T 4 x 48 • 

Lemma 21. Suppose (a, t) £ S differ on vertex i <n. Suppose that <7j + i = 
Oi. Obtain a' and r' by scanning left to right, starting at vertex 1. Then 
EpHam(a / ,T y )] < 191/192. 
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Proof. If i < 2, then the lemma follows from Lemma 18. Next sup- 
pose i = 3. Consider the recoloring of vertex i — 2. With probability ^, it 
is recolored with the first color that is not in {o*i_i, <7j, Tj}. Now Lemma 
18 guarantees that E[Ham(<7 / , r')] < y^. Otherwise, Lemma 16 guarantees 
that E[Ham(cr', r')] < 1. Our conclusion for i = 3 is that E[Ham(cr / , r')] < 
l"T5 + l" 1 = 3I- Finally, suppose i > 3. Consider the recoloring of vertex 
i — 3. Let c be the first color that is not in {o"j_i, (Tj, Tj}. With probabil- 
ity ^, the color that is chosen for vertex z — 3 is the first color that is not 
in {cjj_4, crj_2, c}. Suppose this happens. Then with probability |, c is cho- 
sen for vertex i — 2. Then Lemma 18 guarantees that E[Ham(<r / ,r / )] < yi. 
Otherwise, Lemma 16 guarantees that E[Ham(d / , r')] < 1. We conclude that 

E[Ham(aV)]<H-±± + T§-l = I§- □ 

Lemmas 19, 20 and 21 imply the following result (by path coupling). 



Theorem 22. Let G be the n-vertex path and let q = 4. Consider the 
Markov chain Ai^ on the state space n + . Then Mix(M^,e) < 1921og(ne~ 1 ). 

6.4. Lower bounds for q>4. In this section we prove that Glauber re- 
quires O(nlogn) updates and scan requires f2(logn) sweeps. We use the 
"disagreement percolation" method of van den Berg [28] . 



6.4.1. Calculating the stationary distribution for bounded line segments. 
Consider an s-edge path (for any s). Consider the q x q "transfer matrix" 



A 



/0 1 1 

1 1 

1 1 

1 1 1 

U i i 



1 !\ 
1 1 

1 1 

1 

1 0/ 



Note that j4 s [z,j] is the number of colorings of the path in which the right 
vertex is colored with color i and the left vertex is colored with color j. 
We will write to denote the row vector with a 1 in column i and zeros 
elsewhere. Write / to denote the row vector (1,1,...,1). Write Vi to denote 
the row vector with q — 1 in column i and —1 elsewhere. Let e' i: f and v[ be 
the corresponding column vectors. Thus, eiA s e'j is the number of colorings 
from color % on the right to color j on the left. 

Now the (right) eigenvectors of A are /' with eigenvalue q — 1 and, for 
every j, v'j with eigenvalue —1. Since ej = q~ l f + Q~ 1 Vj, we have 

A s e' 3 = q-\A s f + A s v'j) = q-\(q - l)'f + (-1)^), 
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by induction. Thus, if s is even, we have 

A'e! j = q- 1 {{q-l)'f' + i/ j ). 
So, for i 7^ j, the number of paths from color i to color j is 
(21) e^e'^q-^q-iy-l). 
Also, the number of paths from color j to color j is 
e j A'4 = q- 1 ((q-l) 8 + q-l) 

(22) 

= q-\q-iy(i + ( q -i)-t-V). 

6.4.2. Calculating the induced distribution on the color of an internal ver- 
tex. Suppose that i and r are positive even integers and let k = £ + r. Con- 
sider a path on vertices 1, . . . , 1 + k. Consider the uniform distribution on 
colorings in which vertices 1 and 1 + k are both colored with color j. 

We wish to bound the probability that vertex 1 + £ is colored with color 
j. From (22), this is 

q-\q - lf(l + (g - 1)'^) X q-\q - lf{l + (g - l)^ 1 )) 
(23) =,- 1 (l + (,-D"^^i±^" ( " 1) 



l + (g-l)- (fc_1) 



>9 -1 (l + (?-!) 



6.4.3. Dividing the line into segments. Let r be the largest even number 
not exceeding ^log g „ 1 n. Let £ be the smallest even number that is at least 
481og e n. Let k = £ + r and m = [(n — l)/k\ . For i E {0, . . . , m — 1}, let Li be 
the vertex l + ik and Mj be the vertex 1 + ik + 1. Finally, let L m = 1 + mk 
and Ri be Lj+i- The idea is to divide the line into line segments. Segment 
i has left endpoint Lj and "middle" point Mj (which is not quite in the 
middle!) and right endpoint Ri. 

Let Zi be the indicator for the event that vertex Mj is colored with color 
0. Let Z = X^o 1 0^ course, the expectations of Zi and Z are only well 
defined if we focus attention on a particular distribution over fL We will use 
the notation E 7r (Zj) to refer to the expectation of Zi in distribution tt. 



6.4.4. Calculating the distribution of Z in stationarity. Let tt be the 
uniform distribution on Q. We can sample from tt by filling in the colors 
from left to right. There are q(q — l)™" 1 possible colorings in (7. Given the 
colors of vertices 1, . . . ,n — v (for v < n), there are (q — l) v ways to finish 
the coloring, and these are chosen uniformly. The probability that Z\ = 1 is 
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1/q. For any i > 1, we can use (22) [observing that it is (22) that determines 
the upper bound, and not (21)] to see that the probability that Zj = 1, 
conditioned on colors a\, . . . , &m 1 _ 1 , is at most 



(9-1)' 

Thus, Z is dominated from above by the sum of m independent Bernoulli 
random variables with success probability p = q~^(l + (q — 1)~ ^). 
Let e = m -3 / 8 . By a Chernoff bound, 

(24) Pr 7r (Z > (l + e)mp) < exp(-e 2 mp/3). 

Note that (q — l) k ~ l = uj{n 1 ^) and that m = 0(n/logn). Thus, (24) im- 
plies 

Vi n (Z > q~ l m + ±mra~ 1/3 ) < Pr 7r (Z > (1 + e)mp) 

(25) 

< exp(— e 2 mp/3) = o(l). 

6.4.5. ^4n initial distribution for the Markov chains. Equation (25) shows 
that, in the stationary distribution ir, Z is unlikely to exceed q^ 1 m + ^mn^ 1 ^. 
In this section we will define an initial distribution ttq on colorings in 0. 

The idea will be to show that, if o~(0) is chosen from ttq and <r(0), <r(l), . . . , a(t) 
evolves according to the dynamics (either Glauber or scan) and t is too small, 
then, in the distribution of cr(t), Z is likely to exceed q~ 1 m + ^mn" 1 ' 3 . This 
allows us to conclude that the chain does not mix by step t. 

Let 7Tq be the uniform distribution on colorings in which vertices Lq, . . . , L m 
are colored 0. By (23), E^Z > mg _1 (l + (q- l)" (r " 1} ) > mg _1 (l + (q - 
l)ra -1//3 ). By a Chernoff bound, 

(26) Pr^ (Z > q~ l m + ±mn~ l/3 ) > 1 - o(l). 

6.4.6. The t-step distribution for systematic scan. Suppose cr(0) is cho- 
sen from ttq. Let <r(0), <r(l), . . . evolve according to the dynamics of A^_». Let 
£_>j(Z) denote the distribution of the random variable Z in the coloring 
a(t). 

Suppose r(0) is chosen from ttq. Let r(0), r(l), . . . evolve according to the 
"clamped dynamics" which is the same as M^, except that all moves 

involving vertices {Lq, . . . ,L m } are rejected (so the color of these vertices 
cannot change). Let C c _^ t (Z) denote the distribution of the random variable 
Z in the coloring r(t). By construction, the distribution C ( L it (Z) is same 
as the distribution of Z in ttq. [This follows because ttq is the stationary 
distribution of This can be proved as follows, where is the set of 

colorings in which vertices Lq, . . . , L m are colored 0. Let be the transition 
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matrix of M'L, and let be the transition matrix of the reversal. Then 
any stationary distribution ir' of .ML, satisfies 

ttV)= E n'(a)P^(a,a')= £ *>{a)Pl (a', a), 

but the latter equation is satisfied by the uniform distribution ir' = ttq. Also, 
the chain is ergodic so has a unique stationary distribution.] 

To upper bound dTv(>C- >i t(^),>C^L > t (Z)), we will consider a joint process 
(cr(t),r(t)) in which the first component has the same distribution as (cr(t)) 
and the second component has the same distribution as (t(£)). The total 
variation distance dyy (C^ t t{Z), CL> t (Z)) is upper-bounded by the proba- 
bility that some vertex Mj gets different colors in a(t) and r(i). 

The particular joint process that we will consider starts with <t(0) = r(0). 
To move from (cr(t — 1), r(t — 1)) to (cr(t), r(t)), we use the "switch coupling." 
When we consider vertex v for recoloring, we will couple the color choices 
as follows: 

(A) if we consider color cr(v — 1) for v in a, then consider color t(v — 1) for 
v in r, 

(B) if we consider color t(v — 1) for v in <r, then consider color a(v — 1) for 
v in r, 

(C) otherwise consider the same color for v in r as in a. 

We will be particularly interested in t < r. For such a t, and for any 
i 6 {0, ...,m — 1}, the probability that vertex Mi gets different colors in 
a(t) and r(t) is at most the probability that we chose option (B) in order 
for vertices Li + 1, . . . , Lj + i over the t scans. 

Say that vertex Li + v is "interrupting" (i.e., it interrupts the disagreement 
percolation) if, the first time that we consider this vertex when we have a 
disagreement at vertex Li + v — 1 , we choose some option other than option 
(B) for vertex Lj + v. 

The probability that vertex Mj gets different colors in a(t) and r(t) is at 
most the probability that we have fewer than t interrupting vertices in L,, + 
1, . . . ,Li + £; this probability is dominated (from above) by the probability 
of having fewer than t successes in £ Bernoulli trials with success probability 
(q— l)/q. So if we take any t < r/2 < (2/3)£(q— l)/q, a Chernoff bound says 
that the probability of having fewer than t interrupting vertices is at most 

exp(-(l/3) 2 %-l)/(2 (7 )), 

which is at most n~ 2 by the definition of £. Thus, the probability that there 
exists an i such that vertex Mj gets different colors in a(t) and r(t) is at 
most mn~ 2 = o(l). 

Thus, for any t< r/2, d T Y(C^ tt (Z),C c _ t (Z)) = o(l), so, by (26), 

Pr£_ )t (z) (Z > q~ l m + imn~ 1/3 ) > 1 - o(l). 
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Combining this with (25), we find that cItv(£— , 7r) > 1 — o(l) so systematic 
scan does not mix in t steps. Thus, we obtain the following theorem. 

Theorem 23. Let G be the n-vertex path, and let q > 4. Consider the 
Markov chain ,M-> on the state space f2. For any fixed e < 1 and sufficiently 
large n, 

M^{M^,e)>\{\\og q _ 1 n-2). 

6.4.7. TTie t-step distribution for Glauber. A similar argument to that 
of Section 6.4.6 can be used to show that Glauber dynamics does not mix 
in t steps for some t = n(nlog„_i re). The particular value of t for which 
the straightforward argument works is around nr/(q 2 e). We prefer to give 
a stronger argument which gives a better bound as a function of q. The 
idea for the stronger argument is as follows. In Section 6.4.6 we showed that 
the distribution of C^> t t(Z) an d C% t (Z) were close by showing that, with 
high probability, there was no i £ {0, . . . ,m — 1} for which a disagreement 
at vertex Lj or i?j could percolate to vertex Mj. Here we observe that the 
distributions £_>,<(£) and C% t (Z) would be close even if some, but not 
many, of the percolations occur. 

We start with some notation. It will be helpful to keep track of the nearest 
endpoint to an arbitrary vertex v. For this purpose, if v is in the range 
Li + 1, . . . ,Mj — 1, its ^important neighbor 11 will be vertex v — 1. On the 
other hand, if v is in the range Mj, . . . ,Ri — 1, its important neighbor will 
be vertex v + 1. 

As in Section 6.4.6, we will consider a process <r(0), ... in which c(0) 
is drawn from ttq and a(t) evolves according to A^gi- We will also consider 
the process r(0), r(l), . . . in which r(0) = o"(0) and r(t) evolves according 
to a clamped dynamics -M G1 in which moves involving Lq, Li, . . . , L m are 
rejected. We will construct a joint process (a(t),T(t)) with <t(0) =t(0). To 
move from (a(t — l),r(t — 1)) to (<r(t), r(i)), we choose the same vertex v 
in both copies. If v = Li, for some i then only a is changed. If v > L m , then 
we use the same color in both copies. Otherwise, we do a switch coupling, 
based on the important neighbor, w, of v. In particular, we couple the color 
choices as follows: 

(A) if we consider color a{w) for v in a, then consider color t(w) for v in t, 

(B) if we consider color t(w) for v in a, then consider color o~(w) for v in t, 

(C) otherwise consider the same color for v in r as in a. 

Suppose that t < (qnr)/(2e(q — 1)). The probability that Mi gets different 
colors in a(t) and r(i) is at most the probability that (at least) one of the 
following occurs during the t steps: 
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• During some ordered sequence of i — 1 steps, the process recolors vertices 
U + 1, U + 2, . . . , U + 1 - 1 = Mi - 1 using option (B). 

• During some ordered sequence of r steps, the process recolors vertices 
Ri — I, Ri — 2, . . . , Ri — r = Mi using option (B). 

The probability that one of these occurs is at most 

2 f* )(LY < 2 ( JLY < I ( *lY < I ( J_V , 

\ry \qnj \rqnj 8\rqnJ 8\q—lJ 
where we have crudely used r > 4 in the second inequality. Thus, 

(27) d TV (A^),£ C GM(^)) < K^t)"- 

Now combining (23) and (27) we have 

Prc^Zi = 1) > Pr^ jZ, = 1) - dTV^GM^UcM^)) 

> g -l + £ n -l/3. 

~ y 8 

So 

E £GM (Z)>g- 1 m+|mn- 1 / 3 . 

Also, 

v a r £G1 it (Zi) = Prc Glt (Zi = 1) Pr £GI <t {Z% = 0) < 1. 

We will show in Lemma 25 (below) that, for i / j, cov£ G1 t (Zi, Zj) < m _1 . 
So 

Var £Gl,i( Z ) =H Var £Gl,t(^) +H COV £Gl,t(^' Z i) 

<m + ^2covc Ght (Zi,Zj) 
< 2m. 

Let 
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Note that A = u(l) as a function of n. Also, 



E£ G1 t {Z) - Ayvarc^Z) > q 1 m + fmn 



1/3 



AV2m 



q m+ i^mn 



1/3 



Thus, by Chebyshev's inequality, we have 



Pr£ GM (Z < q- 1 ™ + \mn^) < Pr CQlt (Z < E Cgu (Z) - X^av CGlt (Z)) 

< A~ 2 = o(l). 



Combining this with (25), we find that dxv^Gi.t) > 1 — o(l), so Glauber 
dynamics does not mix in t steps for any t < (qnr) / (2e(q — 1)). Thus, we 
obtain the following theorem. 

Theorem 24. Let G be the n-vertex path, and let q > 4. Consider the 
Markov chain M.g\ on the state space Q. For any fixed e < 1 and sufficiently 
large n, 



Lemma 25. For i / j, cov£ G1 t (Z^ Zj) <m 1 . 

Proof. We will show that Zi and Zj have low covariance in the t-step 
distribution by showing that Glauber dynamics (over t steps) is quite close 
to a "clamped distribution" in which some vertex between Mj and Mj is held 
fixed. This "disagreement percolation" argument is similar to the argument 
in Section 6.4.7. The only difference is that, in order to get a sufficiently small 
upper bound on the covariance, we have to look at a "clamped process" that 
is slightly different from Mqi- In particular, in Mqi, Mi is only r vertices 
away from the nearest "clamped vertex," Here we need to spread the 
clamped vertices out more symmetrically with respect to the vertices Mj. 
Let 



Consider a process a(Q),a(l), . . . which evolves according to A^gi- Let p(0), 
p(l), ... be a process which evolves according to a clamped version of A^gi 
in which those moves involving vertices in T are rejected. We refer to this 
dynamics as Mqi (where "sc" is intended to indicate "symmetric clamped"). 
Consider the joint process (a(t),p(t)) which starts with p(0) = <t(0) and 
progresses according to the identity coupling [the same vertices and colors 
are chosen in the transition a(t — 1) — > a(t) and in the transition p(t — 1) — ► 



Mix(Af G i,e) > 



gn((l/3)log g _ 1 n-2) 
2e(q - 1) 



r = {M i + fc/2|iG{0 ) ... ) m-l}}. 
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p(t)}. Now the probability that cr(t)j^. / p{t)hii or er(i)Mj / p(t)M 3 (or both) 
is at most 

since an ordered sequence of fc/2 vertices would need to be chosen either 
from the left toward Mj or from the right toward Mj or from the left or 
right toward Mj. The probability that a particular vertex is chosen at any 
step is 1/n. Since t < (qnr)/(2e(q — 1)) and k > 8er, this is at most 

^V /2 <e- fc / 2 <n" 24 . 
kn J 

Now 

cov £GLt( Z ^ Z i) =~ El C G i >t ( Z i Z j) -^C Gl .t{Zi)^c GU {Zj) 

= PiW^i = 1 A Zj = 1) — PT Ceitt (Zi = l)Pr £GM (^ = 1) 

<Pr £ g M (Z 4 = lAZ, = l)+n- 24 

- (Prqc ^ = 1) -n- 24 )(Pr £ g i t (Z, = 1) -n" 24 ) 
< covc s ^ l t (Zi,Zj) + 4re~ 24 

= 4n" 24 . □ 



7. Optimal mixing for Glauber and scan when q > 4. Let G be the 

n-vertex path. For q > 4, Lemma 1 of [22] shows that Glauber dynamics 
mixes in 0(n log n) steps. For scan, we use the coupling from Section 6.3. 
Consider a pair (a, r) G 5 which disagrees at a single vertex i. Obtain a' 
and t' by scanning left to right, starting at vertex max{l,i — 1}. Lemma 
16 shows that E[Ham(cr / , r')] < 3/4. This implies the following theorem (by 
path coupling). 

Theorem 26. Let G be the n-vertex path, and let q > 4. Consider the 
Markov chain M^, on the state space Q, + . Then Mix(A / i^,e) < 41og(ne _1 ). 

8. i?-coloring: 0(n 5 ) updates or scans suffice. Let if be a fixed graph, 
possibly with self-loops, and let O be the set of ff-colorings of the graph 
G. These are the homomorphisms from G to H — see [7, 15, 19] for details. 
We can extend the dynamics Mqi and M—> to the domain of iJ-coloring 
by modifying the procedure Metropolis(f ) from Section 2. In particular, a 
proposed color c (which is a vertex of H) is accepted if and only if every 
neighbor w of v is colored with some neighbor c w of c. The original dynamics 
corresponds to the situation in which H is a g-clique with no self- loops. 
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Suppose that H is connected. Let G be the n-vertex path. If H has an 
odd cycle, then Glauber dynamics and systematic scan are both ergodic on 
fl, the set of .ff-colorings of G. In this case we say that any two colorings, 
a E Q and r £ $7, are compatible. If H does not have an odd cycle, then it is 
bipartite. Neither dynamics is ergodic on $7. However, the .ff-colorings can 
be partitioned in a natural way into two subsets, such that Glauber and 
scan are both ergodic on either subset. In particular, the i?-colorings are 
partitioned as follows. Two -colorings a G $7 and r£(! are compatible if 
CTi and T\ are chosen from the same side of the bipartition of H . Our aim is 
to show rapid mixing on the set(s) of compatible colorings: 

Let h = \ V(H)\. Define t as follows: 

!4h — 1, if H is not bipartite and n is even; 

2h — 1, if H is bipartite and n is even; 

Ah, if H is not bipartite and n is odd; 

2h, if H is bipartite and n is odd. 

Note that n + 1 is always odd. 

Lemma 27. In any two compatible H-colorings a and t, there is a t-edge 
path in H from a n to t\. 

Proof. We look at each of the four cases. 

H is not bipartite and n is even: Let c be some point on an odd-length 
cycle. Go from a n to c in at most h — 1 edges. Also, go from c to T\ in at 
most h — 1 edges. If the constructed path has an even number of edges, 
go around the cycle using at most h more edges. Now go back and forth 
on the last edge to make the total length equal to t. 

H is bipartite and n is even: Note that a n and t\ are on opposite sides of 
the bipartition. Go from a n to t\ in at most h — 1 edges and go back and 
forth on the last edge. 

H is not bipartite and n is even: Let c be some point on an odd-length 
cycle. Go from a n to c in at most h — 1 edges. Also, go from c to T\ in at 
most h — 1 edges. If the constructed path has an odd number of edges, 
go around the cycle using at most h more edges. Now go back and forth 
on the last edge to make the total length equal to t. 

H is bipartite and n is odd: Note that a n and t\ are on the same side of 
the bipartition. Go from a n to t\ in at most h — 1 edges and go back and 
forth on the last edge. □ 

8.1. Constructing canonical paths. Let Q' be the state space. It is either 
the set of all proper colorings (if H is not bipartite) or it is one of the two 
maximum sets of compatible colorings (if H is bipartite). We will use the 
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canonical paths method, which can be viewed as a special case of comparison 
in which we compare A^gi to the uniform random walk on £1' . Thus, for each 
a£Sl' and 7"6fl', we will construct a canonical path r ) (JT from a to r. 

First, let a n c\ ■ ■ ■ Q_iTi be some t-edge path from a n to t\ and let z\Z2 ■ ■ • 
Z2n+t-l denote o\ • ■ ■ a n c\ • • • ct-\T\ ■■ • r n . Let Zi denote the ii-coloring 
ZiZi-\.\- • ■ Zi+ n -i, so Z\ = u and Z n+t = r. The path 7 CTjT passes through 
Z\, Z3, Z5, . . . , Z n+ f Moving from Zi to Zi + 2 can be implemented by n 
Glauber transitions (applied to vertices 1 to n in order). Let 

(28) A = m ax 1 — ^7 ^ 7r ( f7 ) 7r ( r ) I>,t|, 

where the max is over all Glauber-dynamics transitions (ct,/3) and the sum 
is over all pairs (<t, t) such that is on the canonical path j a ,r- By 

Theorem 2.1 of [9], we have A(A^gi) ^1/^4- We now derive an upper bound 
on A 

The three stationary probabilities in (28) are all 1/|0'|. Furthermore, 
every canonical path 7o- jT satisfies |7o-,r| < J ^n. Finally, PQ\(a,f3) = \. 
Plugging this into (28), we get 

n + t nh ^ 
A < n- — - max > 1. 

- 2 \n'\ a,p ^ 

I I " (T,T 

We will show that the number of pairs (a, r) using transition (a, (5) is 
0{n |fi'|), from which we can conclude 

(29) A = 0(n 4 ), 

viewing h as constant. The method we use is standard: each canonical path 
through (a,/3) will be assigned a unique "encoding" chosen from a set of 
0(n|f2'|) encodings. 

So now fix (a, (5) and consider the set of all canonical paths that use 
transition (a,/3). We show how to encode a typical such path, from a to r, 
say. Let T n d x ■ ■ ■ d t _ x G \ be some t-edge path in H from r n to o\ and let 
z\z~2 ■ ■ ■ &2n+t-i denote the path t± ■ ■ ■ ■ ■ ■ d t _ x o\ ■ ■ ■ a n . Let Zi denote the 
iJ-coloring kh+i • ■ • Zi+n-i- 

The encoding of the canonical path from a to r consists of the following 
information: 

• i, indicating that the current transition is on the path from Zi to Zi + 2, 
and the colors Zi and 

• and 

• the colors o-j_ i+ i,...,(Ji_i and n-t+i, ■ ■ ■ , Tj_i. 

Given the transition (a,f3) and the values of i, Zi and ^+1, we can deduce 
Zi. From Zi and Zi and the extra colors, we can deduce a and r. Thus, the 
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number of pairs (a, r) using the given transition is at most the number of 
encodings, which is 0(n |fi'|) as required, so we have now established (29). 

Note that A^gi is reversible. Let 1 = /?o > /?i > • • • > P\n>\-i > —1 be the 
eigenvalues of its transition matrix Pq\. Since 1 — [3\ = A(A^gi)i we have 
1/(1 — Pi) < A. To bound the mixing time of .Mgi> we a l so need an upper 
bound on — ■ This is an easy application of Proposition 2 of [10] since, 

for every a E $7', we have Poi(a,a) > 1/h. 

In particular, for every cr G O', we define the (odd-length) canonical path 
from a to itself to be single transition PQ\(a,a). Proposition 2 of [10] then 
gives 

11 1 h 
< — max — — ; < — . 

l + /3|n/|_i-2 - P Gl (a,a)~2 

Combining this with (29), Proposition l(i) of [26] gives 

Mix CT (A4 G1 ,e)<— — (lnTr^ + lne- 1 ) 

1 Pmax 

= 0(n 4 )(lnvr(cj)- 1 +lne~ 1 ). 
Thus, we have the following theorem. 

Theorem 28. Let H be a fixed connected graph. Let G be the n-vertex 
path. Let W be the state space of Mq\, which is either the set of all proper 
H -colorings of G (if H is not bipartite) or one of the two maximum sets of 
compatible colorings (if H is bipartite). Consider the Markov chain Mg\ on 
the state space O'. Then 

Mix(M G ue) = 0{n 5 lne" 1 ). 

In Section 10.1 we will show how to use our lower bound \(Mq\) > 1/A 
to get a corresponding lower bound on A(.M— >). This will imply that the 
mixing time of systematic scan is also 0(n 5 lne -1 ), though, for technical 
reasons (since scan is not reversible), we state the result in continuous time. 
See Theorem 31 in Section 10.1 for details. 

8.2. Special case. Suppose that H is an odd cycle of length k. We noted 
at the beginning of Section 8 that Glauber and scan are ergodic on £1 and 
Section 8.1 shows that the mixing time is 0(n 5 ). In fact, the analysis for 
3-coloring translates directly to the case of a fe-cycle so we get the following 
analog of Theorems 1 and 3. 

Theorem 29. Let H be an odd cycle. Let G be the n-vertex path. Let 
0' be the set of H- colorings of G. Consider the Markov chain M.q\ on the 
state space 0'. Mix(A^Gb \) = 6(n 3 log?i). 
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The generalization of the proofs of Theorems 1 and 3 is straightforward. In 
Section 4.1 each configuration XgT corresponds to k colorings. In Section 
4.2 the height hi of every vertex i satisfies hi = (mod A;). The quantity B 
in Section 4.3 is increased by a factor of k. A similar result holds for scan. 

9. Directed iif -coloring. It is natural to ask whether the ii-coloring re- 
sults could be generalized, for example, to directed .ff-coloring. The answer 
is no. To illustrate this, we give an example of a directed H that is not 
ergodic on the n-vertex path G, and another example of a directed H for 
which Glauber is ergodic, but mixes slowly. 

For the first example, let H have vertex set {x,y,z} and edge set {(x,y), 
(y,z), (z,x)}. Now the three possible colorings of G are 

xyzxyz . . . , yzxyzx . . . and zxyzxy .... 

These are not connected by either Glauber or scan moves. 

For the second example, let the vertices of H be {x, &i, ci, ... , c^}. 
Let the edges of H consist of an edge from x to every vertex (including 
itself), a directed clique on B = {b\, . . . , bk} and a directed clique on C = 
{ci, . . . , Cfc}. Let X be the singleton set {x}. The .ff -colorings of G correspond 
to the length- n strings satisfying the regular expression X* B* U X* C* . Let 
A be the set of i?-colorings satisfying the regular expression X*B + . (A 
coloring in A starts out with a possibly empty sequence of color-x vertices, 
then contains a nonempty sequence of vertices with colors from B.) Let M 
be the set of all colorings with at most one color from BU C. Since B and 
C are the same size, n(A) < 1/2. Furthermore, for a € A\M and r G A\M, 
Pg\(ct, t) = and P^(cr, r) = 0. Claim 2.3 of [12] shows that the mixing time 
of both of these chains is at least tt(A)/8tt(M). Now 

71- M) - W - W > W - 1 
v ; \n\ 2\A\ + 1 - 3|A| 3' 

Also, 

which completes the proof. 



10. Comparisons of scan and Glauber for general graphs. From the re- 
sults obtained so far, it seems as if one sweep of systematic scan is equivalent 
to a linear number of Glauber updates. In the majority of cases examined 
(Sections 4-7), we have obtained tight asymptotic bounds, and we know the 
equivalence is exact. Where we don't have tight bounds (Section 8), at least 
our results are consistent with this supposed equivalence. It is natural to 
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wonder whether a result can be framed that relates scan and Glauber in a 
more general setting, where the graph G is arbitrary. 

In this section we use the comparison method of Diaconis and Saloff-Coste 
[9] to compare the optimal Poincare constant A(A^gi) of Glauber dynamics 
to the optimal Poincare constant A(A4_>) of scan. Ideally, we might hope for 
= Q{n\{MQ\)). In fact, the best bounds we can prove lose a factor 
n in either direction so we have a lower bound for A(.A/f_>) of S7(A(.Mgi))> 
and an upper bound of 0(n 2 \(M.Q\)). Moreover, for the lower bound, we 
need to assume G has bounded degree. 

10.1. Comparing scan to Glauber. 

Theorem 30. Suppose G has maximum degree A. Let -Mqi and -M_> 
be the Glauber dynamics or systematic scan applied to H -colorings of G for 
a fixed but arbitrary H. Then A(A^Gl) < 4g A+1 A(7W^). 

Proof. Suppose a — > a' is a possible Glauber transition, that is, Pq\(o-, 
a') > 0. Let i be the unique vertex satisfying <7j ^ a[. Say that r G f] is 
between a and a' if a — > r is a possible scan transition, and additionally: 
(i) Ti = a[ and (ii) tj = o~j for all j ~ i, where "~" denotes adjacency in 
G. Denote by B(a,a') the set of states between a and a'. Consider a scan 
transition from state <r, and denote by Si the event that, for all k G {i} U 
{j '■ j ~ the color proposed by Metropolis (A;) is o~'(k). Similarly, consider 
a reverse scan transition from state a' , and denote by J-{ the event that, for 
all k G {i} U {j :j ~ i}, the color proposed by Metropolis(/c) is a'{k). 

The following observations are easy to verify: 

• Conditioned on £j, a scan transition from state a is certain to result in a 
state r G B(a, a'). 

• For all rG-B {a, a'), P^{T,a'\Ti) = P^(a,t\£i). 

• Pr(£) > g^ A+1 ) and Pr(J^) > q~( A+1 \ 

It follows from these three observations that 

reB(a,a') 

> Yl min{Pr(^)P^(a,r|^),Pr(^)^(^^l^)} 

T&B(a,a') 

> g -(A +1 ) £ mm{P_( < 7,r|£i),P^(ry|^i)} 

t&B(u,o') 

= q -(A + D y: P-+(a,T\£i) 

= a -(A+D. 
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Then for any / : £1 — > R, 

£M Gi (fJ) = \ J2 ?r(a)P G1 (ay)(/(a)-/(<7')) 2 

E ^)^Gl(^') 

x ^ min{P_(a,r),P^(r,cT / )} 

TeB(cr,CT') 

x(/(a)-/(a')) 2 

(30) <g A+1 X) 7r(a)P G1 (a,a') 

x 2 [^(^r)(/(a)-/(r)) 2 

T6-B(cr,cr') 

+ P^(r,a')(f(r)-f(a')) 2 ] 
<q A+1 7r(a)P Gl (a,a') 

x J2 P^,r)(f(a)-f(r)) 2 

o-',cref2 

x X) P^(r,a')(f(r)-f(a')) 2 

(31) =2g A+1 ^ v{o)P Gl {o,o') ^ R>,r)(/(<r)-/(r)) 2 

cr.cr'en TeB(o-,cr') 

= 2g A+1 53 7r(a)P_>,T)(/(<T)-/(r)) 2 53 P G1 (a,a') 

(32) <2<z A+1 53 7r(a)P^(a,T)(f(a)-f(T)) 2 

= Aq A+1 SM^(fJ). 

Inequality (30) applies the fact that \{a — b) 2 < (a — £,) 2 + (£ — &) 2 for all £. 
Inequality (31) uses the fact that Glauber is time reversible, that is, that 
ir(a)P G i(a,a') = TT(a')P G \(a' , a), for all a, a' £ f2 and the fact that B(a,cr') = 
B(a' ,a). Inequality (32) seems crude at first sight, but it is not obvious how 
to do better: the knowledge of r does little to constrain a' . □ 



SYSTEMATIC SCAN FOR SAMPLING COLORINGS 



45 



The inverse of X(A4) is closely related to the mixing time of M. Much 
is known about the precise relationship between these quantities; see, for 
example, the inequalities in [1, 9, 10, 12, 16, 23, 26]. Some known results only 
apply when A4 is reversible, or when the eigenvalues of its transition matrix 
P are positive. Our survey paper [14] gives inequalities between Poincare 
constants and mixing times in both the general case and the reversible case. 
We will not repeat the details or trace the development of the ideas here, 
but we mention a few simple facts that are useful for us. Slightly stronger 
bounds can be obtained with more effort. Let be the continuization of 
jM_> as defined in [2], Chapter 2, page 5. Essentially, this is just M^, except 
that the holding time between discrete transitions is exponential with mean 
1. It is a classical result (see, e.g., [23], pages 55 and 63) that the mixing 
time of is bounded as follows: 

(33) Mix x (M^,e) < — i— (21n(l/e) + ln(l/7r(x))). 

Combining (33), Theorem 30 and the upper bound 1/A(A^gi) = 0(n 4 ) from 
Section 8.1, we get the following: 



Theorem 31. Let H be a fixed connected graph. Let G be the n-vertex 
path. Let O' be the state space of A^gi- It i> s either the set of all proper H- 
colorings of G (if H is not bipartite) or it is one of the two maximum sets 
of compatible colorings (if H is bipartite). Consider the Markov chain 
on the state space ffl : 

Mix(.M ^ , e) = O (n 5 In e~ 1 ) . 

Let -M G f be the "lazy" version of Glauber dynamics from page 53 of 
[23]. In each step, the lazy Markov chain stays where it is with probability 
1/2, and otherwise makes the transition specified in the definition of 
We introduce the lazy chain to keep the eigenvalues positive. See [14] for 
inequalities which avoid this device. The following inequality from [14] is 
similar to Proposition 1(h) of [26]: 

< maxMix x \M G \,— 



X(M G i) AGMf)- - x \ w '2e 

Combining this with Theorem 30 and with (33), we find that, for bounded- 
degree graphs G, the mixing time of is at most 0(n) times the mixing 
time of A^ c f. Perhaps this result can be improved by a factor of 



2 
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46 M. DYER, L. A. GOLDBERG AND M. JERRUM 

Theorem 32. Suppose G is arbitrary. Let Mqi and be the Glauber 
dynamics and systematic scan applied to H -colorings of G for a fixed but 
arbitrary H. Then \(M->) < n 2 q\(MGi)- 

Proof. Let a, a' £ be a pair of states for which P_>(c7, a') > 0. There 
is a natural canonical path 7 CTi(J ' = [a = r° — > r 1 — > • • • — > r n = a') from a to 
a' using Glauber transitions, in which r*" 1 differs from r* (if at all) only at 
vertex i. According to [9], Theorem 2.1, the quantity we need to bound is 

^= , Mp 1 j- -a K{a)P^(a,a')\y a>a ,\ 

v > ^ ' ' <7,<t' : (r,r')e7 CT>CT / 

= n 2 <? ^ P-^er'), 

cr.cr' : (r,r')e7 CT (T / 

where we have used the facts that ir is uniform, \^ a ,a'\ = n , and Pgi(t,t') = 
1/nq. (Diaconis and Saloff-Coste state their theorem for time-reversible 
MCs, but their proof does not use time reversibility.) We shall demonstrate 
that A < n 2 q, from which it follows that X(M^) < n 2 q\(M.G\). 

Regard r and r' as fixed, and suppose r and r' differ at vertex i. Denote 
by £f the event that the sequence 

Metropolis(l), Metropolis (2), . . . , Metropolis (i — 1) 

takes o~ to t, and by the event that 

Metropolis(i + 1), Metropolis (i + 2), . . . , Metropolis (ra) 

takes r' to a'. Then P^(a,a') < Pr(£f A = Pr(£f ) Pr(£f'), and 

J2 R, (a, a')<J2 Pr(£f ) £ Pr ( £ 2 ) • 

<t,<t' : (t,t')67 ct / ct <t' 

The second sum above is clearly bounded by 1, since the events {E^ '■ o~' £0,} 
are disjoint. In fact, the first sum is also bounded by 1, since Pr(£f ) is equal 
to the probability that the sequence 

Metropolis^ — 1), Metropolis^ — 2), ... , Metropolis(l) 

takes r to a. So the terms of the first sum may also be viewed as probabilities 
of disjoint events. Thus, A<n 2 q, as claimed. □ 

Combining Theorem 32 with inequalities of Diaconis and Stroock [10] and 
Sinclair [26], we get 

(34) Mix x (M G *,e) < n 2 q ] In- 



A(M^) eir(x) 
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This can be combined with the upper bound 

_L_ 2(max x Mix x (.M^,l/e)) 2 
1 j \(M^)- (1/2 -1/e) 2 

The square of the mixing time in (35) is necessary in the general nonre- 
versible case (see [14]), though of course better inequalities might apply to 
the particular chain Ai^,. Combining (34) and (35), we get a weak inequal- 
ity which shows that the mixing time of (lazy) Glauber dynamics is at most 
0(n 3 ) times the square of the mixing time of systematic scan. 

Note that the proofs of Theorems 30 and 32 are actually about Dirichlet 
forms rather than about Poincare constants, so the same inequalities apply 
to the log-Sobolev constant. 

Acknowledgment. The authors thank Persi Diaconis for comments on a 
draft of this article. 
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